Read "Summary of a Workshop on Software Certification and Dependability" at NAP.edu

Page 5 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

2
Summary of Panel Sessions and Presentations

PANEL A: THE STRENGTHS AND LIMITATIONS OF PROCESS

Panelists: Isaac Levendel, Gary McGraw, and Peter Neumann

Moderator: Martyn Thomas

Software development involves many activities—for example, requirements elucidation, requirements analysis, specification, version management, architectural design, modularization, component design, programming, testing, component integration, verification, performance analysis, prototyping, error correction, feature enhancement, reviews, walkthroughs, and typechecking. These activities may be undertaken in many different sequences; collectively they are referred to as the software development process. Most development standards for dependable software (such as DO-178B¹ or IEC 61508²) recommend that particular processes be adopted; typically, the recommended processes vary for different levels of criticality (different target probabilities and consequences of failure).

In the Panel A session, participants from industry and academia explored the strengths and limitations of using process recommendations and requirements to achieve certifiably dependable software. Some of the themes that emerged during this discussion include these:

While following particular processes cannot alone guarantee certifiably dependable software, comprehensive engineering processes are nevertheless important to achieving this goal.
In evaluating a system, appropriate metrics should be used; it may be necessary to measure secondary artifacts (e.g., process quality) as surrogates if what is actually of interest cannot be measured.
Developing ways to determine how best to allocate resources (e.g., understanding where errors are likely to cluster) can improve both dependability and cost-effectiveness.

¹	DO-178B, Software Considerations in Airborne Systems and Equipment Certification. Issued December 1, 1992, by RTCA, Inc. Available at <http://www.rtca.org/>.
²	IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems. Issued December 1998 by the International Electrotechnical Commission (IEC). Available at <http://www.iec.ch/zone/fsafety/fsafety_entry.htm>.

Page 6 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

The panelists and participants in the workshop largely agreed that mature, comprehensive engineering processes reduce the risk that a software development project will overrun or deliver poor-quality software. But it was also noted that a good process does not guarantee that the resulting software will be of high quality. “Process properties are not predictive of product properties,” was a typical opinion; another was, “You should not believe that if you do it right it will come out right.” The idea that it is possible to “get dirty water out of clean pipes” summarized much of this discussion.

Process was described as a “skeleton” on which a dependable system is built. Process quality is important even if it does not directly predict product quality. For example, it is important to be able to show that the version of the software that has been tested and certified is the exact version that is running, unmodified, in the operational system. Similarly, if the testing process cannot be trusted, it may be difficult to establish confidence in the test results.³

Software development processes were described as a “chain of weak links” and as “weakness in depth,” in that each process step can introduce flaws that damage the product. Product certification requires adequate evidence of the properties of the product, and the weight that can be given to this evidence depends, in turn, on the quality of the processes that created the evidence. So the quality of these processes is an important factor in certification. It is important to distinguish between evidence that the process has been followed (which gives credibility to the outcome of the process) and evidence of the system properties that the process produces.

A detailed record of a systematic process can be essential for the many development tasks that depend on information not easily obtainable from the program itself. For example, reliable build logs and version control are necessary for tracing the affected versions of systems in the field once a fault has been discovered. Similarly, avoiding complete recertification of a system after a maintenance change requires sufficient evidence of the maximum possible impact of the change, so that only the affected areas need to be recertified.

Panelists discussed the artifacts produced as part of development (program code, specifications, designs, analyses, and the like). One panelist noted that it is the properties of these artifacts that should be measured: “I run your code, not your process.” For this reason, a key activity in certification should be measurements of the product and of the intermediate artifacts. Such measurements could include system performance, test coverage, the consistency and completeness of specifications, and/or verification that a design implements a specification. While there does not seem to be a single metric that can predict dependability, several participants said that measures such as these, when used in combination, are good predictors of dependability. It is important to measure properties that actually matter directly, and economic theory suggests that measurement skews incentives: “If you want A and measure B, you will get B.” All of this suggests the importance of good empirical research that relates the attributes under consideration and makes it possible to discern what the dependent and independent variables are.

There was some discussion of the phenomenon that software errors are not evenly distributed throughout a system—they tend to cluster in the more complex areas, creating black holes of software defects. These black holes can be located by looking at the past and recent history of a current release. However, there is an underlying assumption that there are resource constraints and never enough resources to analyze an entire system in this manner. Furthermore, some forms of analysis or testing may be impossible, requiring orders of magnitude more resources than could possibly be made available. One must focus the analysis on areas one thinks deserve the most attention. These black holes in the resulting software can often be traced to black holes in the design or specification, so deep analysis of the quality of these artifacts, early in the development of software, can be very cost-

³

It was suggested that while somewhat too prescriptive, the Capability Maturity Model is correct in its assessment of which processes matter as well as in its insight that there is a necessary progression in process improvement (it is not possible to leap easily to Level 4 from Level 1). The committee will explore this and other process-oriented models in the second phase of the study.

Page 7 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

effective. Panelists and participants generally agreed that limitations on staffing and time are a major consideration in developing dependable software.

Several participants commented that most problems in delivered systems turn out to have their root cause in specification errors, and one said (provocatively and perhaps only half-seriously) that “programming is a solved problem.” It was observed that security problems in contrast often flow from implementation errors such as buffer overflows. “No system is 100 percent secure. Proactive security is about doing it right. Security problems come from software flaws—we must do it right.”

Doing it right was suggested to involve the following:

Producing the right artifacts and using the knowledge of computer scientists and software engineers to guide how those artifacts are produced so that they have the right properties and can be measured or analyzed, and
Measuring the artifacts to confirm those properties.

Panelists and participants also largely agreed that the world is imperfect and not all the software properties of interest can be measured directly. In these cases, secondary evidence must be used—perhaps including measurements of process quality and staff competence—although it is important not to lose sight of the limitations of such evidence.

Page 8 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

PANEL B: LOOKING FORWARD: NEW CHALLENGES, NEW OPPORTUNITIES

Panelists: Robert Harper, Shriram Krishnamurthi, James Larus, and André van Tilborg

Moderators: John Rushby and Lui Sha

Panel B’s topic for discussion was what has changed in the past decades with respect to software certification and dependability. Several themes emerged during the discussion:

Over the past several decades society has become increasingly dependent upon software. While desktop systems are not generally regarded as safety-critical, these days they are often task-critical for their users. This is true not only at the individual level but also at the organizational level.
One increasingly sophisticated set of tools that can help in the software development process with respect to dependability is the set of tools related to programming languages, such as type checkers, static analyzers, and model checkers.
Systems integration is a growing and challenging problem. Additional tools and strategies are needed to cope with large-scale systems integration issues.

While society is increasingly dependent upon software, desktop systems are not generally regarded as safety-critical. However, with respect to what has changed about the use of software systems, one panelist pointed out examples of errors in desktop spreadsheet programs and Web-based travel reservation systems. The existence of such errors illustrates that for the people using such software, even seemingly mundane applications can be critically important for achieving a larger goal. Systems are increasingly interconnected and interdependent, which means that local faults attributable to poor design can lead to wider failures. The argument proposed was that systems are not task-critical because someone in authority says they are; rather, a system becomes task-critical when someone performing a critical task uses that system.⁴ Dependence on systems happens not only at an individual level, as in these examples, but also at an organizational level. For example, the majority of medical-related computing systems (including patient records and lists of appropriate dosages for drugs) are run on standard consumer platforms. When these systems fail, hospital operations can be placed in jeopardy. Thus, some argued that software dependability has become an increasingly important quality-of-life issue and should not be limited to safety or security applications such as aviation and nuclear power plants.

Some panelists and participants argued that programming languages and their associated tools are the key to dependability on the grounds that “code is the ultimate reality.” It was claimed that advances in this field are both broad and deep and that fundamental research on types and programming logics, theorem proving, model checking, and so on has advanced the state of the art. Process-oriented methods have limitations—for example, they do not tend to scale well in a world with thousands of disparate components, and they tend to rely heavily on continuity and corporate knowledge. Programming-language-based approaches may scale better, but only for establishing rather local and low-level properties. The challenge is to find a way to combine the best techniques that can be used at the system level with the best lower-level techniques, and to generate all the evidence required for certification as an integral part of these development processes.⁵

⁴

At the same time, it should be noted that there is disagreement about what should be considered task-critical. As the use and scope of software expand, the number of critical components expands. However, some argue that not everything should be labeled “critical” and that the degree of criticality can vary greatly. The committee will continue to explore this and related issues in the next phase of the project.

⁵

This is not to suggest that appropriate languages and tool sets will guarantee dependability, but rather that incompletely defined languages can pose problems that even the most skilled programmers have difficulties

Page 9 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

Others argued that programming is basically a solved problem and that all the significant issues arise at the system level. It was suggested that language choice matters much less than system-level considerations. However, there seemed to be general agreement that the dependability problem is larger than this seeming dichotomy would suggest and cannot be solved by any one technique, be it architecture, static analysis, process, language, or human factors. Software poses unique challenges, but, as was noted, also unique opportunities. It is not possible to look at and work with the last 10 versions of a particular bridge, for example, in the way software systems can be examined. A rough consensus seemed to be that while the dependability problem cannot be solved by any single method alone, dependability can be seriously compromised by any single weak link. Nevertheless, while there are no silver bullets, there are many specific promising interventions that, taken together, could make a very big difference.

Panelists pointed to great progress in the development of formal methods and their application in industry. Specification languages, logics, and verification methods have matured and are increasingly applicable in the context of large and complex systems (although usually only in rather limited ways). Type systems represent the most elementary use of formal methods, but they are used with success widely (e.g., in languages such as Java and C#); applied type theory was cited as one of the greatest successes in formal methods. Others pointed out that many system failures are environment- and input-dependent. Current type systems do not address the role of the environment in which the system operates, since they can only express local properties of code modules. Stronger forms of static analysis have great promise, although some participants expressed skepticism about the effectiveness of many such forms that are currently in vogue, due to their incomplete and unsound nature.

It was noted that there has been great progress in the construction of stand-alone systems. For example, the desktop system has become more powerful, has more useful features, and has become much more reliable than it was 10 years ago. Unfortunately, the same cannot be said about large systems of systems. Systems integration has remained a vexing challenge. The development of large systems of systems often has the unfortunate characteristic that early phases experience relatively smooth subsystem development while later phases encounter serious integration problems. For example, one participant remarked that during systems integration and testing, complex avionics tend to fail repeatedly when the pilot attempts to use the radar, communication, navigation, identification, and electronic warfare systems concurrently. These integration problems are very costly to fix and are often traced back to complex and unanticipated interactions between subsystems. While there are usually voluminous documents of system interface definitions, they often deal only with normal and predictable functional interfaces. There are many integration problems caused by unanticipated interactions between different technologies developed in isolation, especially in aspects related to real-time properties, fault tolerance, security, and concurrency control—namely, those properties that cannot be established locally. How to control and manage interactive complexity is a key challenge for systems integration.

Before one can start solving the problem of interactive complexity, it will be important to have good instrumentation that allows one to observe the details of the system state and to trace system behavior. Failures that arise in the integration of systems are often highly sensitive to workload and timing, which makes them extremely difficult to track down. There are very few working approaches available that succeed at properly instrumenting a system in spite of the important role it plays.

There was rough agreement that dependable system development should be risk-driven. It is important to determine which of the various aspects of dependability must be determined at the outset

avoiding. Modern languages with associated deep analysis tools can guarantee the absence of whole classes of defects and greatly assist the programmer in achieving dependable systems cost-effectively.

Page 10 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

and which can be deferred. These critical aspects are project-dependent, and the designers’ and developers’ responsibility is to determine the key needs of the project. How to proceed, what aspects to certify, and to what levels of dependability are unclear. Improved notations and modeling tools are needed to address dependability concerns in all phases and in all aspects of development, but it is important that all software artifacts—whether specifications, requirements, models, or code—be kept in sync with each other. How to achieve this remains a serious challenge.

Page 11 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

PANEL C: CERTIFICATION AND REGULATION: EXPERIENCE TO DATE

Panelists: Brent Goldfarb, Mats Heimdal, Charles Howell, and Robert Noel

Moderators: Michael DeWalt and Scott Wallsten

In the Panel C session participants discussed what is meant by “certification” and “regulation,” how much has been done, and how much is possible. Panelists and audience members also discussed whether a well-functioning marketplace for software and software development services would be sufficient to achieve dependability. The panelists noted that these are increasingly important questions as software plays an ever larger role in our lives. Several themes became apparent during this discussion:

The process of certification may add value in a collateral fashion because attention must be paid to issues that might not receive it otherwise; given that software and its uses and contexts change over time, any value that certification has decays over time as well.
Market forces and the cost structure of the software industry may create incentives to release flawed software.
Validation—determining what the software should do—is often harder than verification, or determining whether the software does it correctly, and may be more important. Despite the difficulties of achieving validation systematically, however, many critical systems seem to function well.

The panelists started by focusing on two heavily government-regulated industries: avionics and medical software. They noted that, in general, the software in these industries seems to work fairly well. The Federal Aviation Administration (FAA) works closely with the avionics industry to ensure that regulations are adequate but not suffocating. The FAA also appoints people within these companies to provide software approvals on behalf of the government. This helps the process work more smoothly. This approach is also being discussed by the medical community for the medical device industry, but only in a preliminary way.

One panelist stressed the traditional distinction between verification and validation (verification is a check that software satisfies set requirements, and validation is a check that specifications match the actual need). The general consensus of the panel, and the focus of the ensuing discussion, was that validation is both harder than verification and much less well addressed by current processes. That is, deciding what the specific goals are is the most important step. Once the goals are clearly defined, it becomes easier (though by no means necessarily easy) to verify that those goals are met.⁶ Some claimed, in addition, that validation matters more than verification for critical systems, because few of the errors that have been critical in practice indicate failures of verification.

The value of certification itself was debated extensively by the panel and audience. There was, however, general consensus on three points about certification.

First, even if certification is not directly effective, it may add value in the sense that it forces developers to pay closer attention than they might otherwise. Participants noted that despite the difficulties in achieving validation, many safety-critical systems seem to function well, especially in the arenas of avionics and (to a lesser extent) medicine. One panelist argued that processes such as those recommended in DO-178B have collateral impact, because even though they fail to address

⁶	It should be noted that there are many kinds of functional and so-called nonfunctional requirements, and that the relative importance of verification and validation varies among these (and also among the various stakeholders in the requirements).

Page 12 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

many important aspects of a critical development, they force attention to detail and self-reflection on the part of engineers, which results in the discovery and elimination of flaws beyond the purview of the process itself.

Increasing automation and over-reliance on tools may actually jeopardize such collateral impact. As people are removed from the testing process, for example, in favor of automated procedures that specifically test a set number of items, they are less likely to notice gross omissions in testing or invalid environmental assumptions.

Second, any value that certification has decays over time. Certification, in some sense, is a legal status and is considered a Boolean value—either the software is certified or it is not. Worse, the certification applies to a particular system at a particular time. In reality, things are more complicated. For any change made to a system, it is important to know how far the system has moved away from the version that was certified. One panelist argued that at one extreme, the value of certification is zero as soon as the product is delivered. Even if that assertion exaggerates the problem, there was general agreement that the value of certification does decay rapidly because of changes in the way a product is used, the way people interact with the product, and ways (many unexpected) in which the product must interact with other systems (see the summary of the Panel D session for more on this point). Many system failure postmortems begin with the statement, “A small software change resulted in ….”

Third and finally, a potential benefit of certification is that it establishes a baseline from which changes in performance/dependability/reliability can be measured later. A related question about certification is what, exactly, should be certified. Participants debated whether people should be certified, rather than or in addition to software, but there was no agreement on this question.

There was a claim that markets are not efficient for safety-critical software, especially when used by the military, though this point was disputed. The economics of the software industry, including incentives for the firms involved and how those affect reliability, were discussed. A major problem is the lack of information flow between groups, a problem that can prevent markets from functioning properly. Reference was made to George Akerlof’s seminal paper on the used-car market, where asymmetric information flows prevented that market from working efficiently.⁷ One person noted that for used cars, the Internet and, in particular, services that permit the prospective buyer to purchase a vehicle history report have helped solve that problem. Thus sometimes innovation and markets can help resolve the problem of asymmetric information. It was also noted that there is not a one-size-fits-all approach—market forces work well with some critical systems but not as well with others. Market forces might not work well with some military systems, for example.

It was suggested that the cost structure of the software industry may create a perverse incentive to release software with errors. In particular, software is a high-fixed-cost, low-marginal-cost industry where the cost of releasing corrective patches is low and becoming lower in most cases (although not for embedded control systems). This can create an incentive to release a product before it is ready in order to capture market share, knowing that a patch can be released later at almost zero cost. Although the practice of patching may apply more to the market for desktop consumer applications than for critical systems, this and other aspects of the cost structure of the industry merit serious consideration.

⁷	Akerlof, George A. 1970. “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism.” Quarterly Journal of Economics 84(3), 488-500.

Page 13 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

PANEL D: ORGANIZATIONAL CONTEXT, INCENTIVES, SAFETY CULTURE, AND MANAGEMENT

Panelists: Richard Cook, Gene Rochlin, and William Scherlis

Moderators: Charles Perrow and David Woods

Panel D explored implications of certification and dependability requirements within an organizational context. Some of the themes that emerged during the discussion were the following:

Systems are certified only within a particular context and up to specified system boundaries; certification of a system may not guarantee the dependability and usefulness of a system over its lifetime.
As a system’s reliability increases or is demonstrated over long periods of time, dependence on that system may increase to an extent not anticipated in the original design.
Accountability, reporting, and communication are difficult issues that must be planned and managed in detail across an organization.

The panelists stressed the idea that system design must consider the end users of a system and how that system will be used in context. Certification likewise must address not only the process of creating dependable software, nor even just the final product, but also the many ways it might be used. The participants noted a tension between the narrower and crisper notion of certification, and the broader and blurrier notion of dependable usefulness. Although some certification processes take into account the ultimate context of use, many emphasize primarily the internal consistency, coherence, and completeness of the software itself. Dependable usefulness, in contrast, entails the extrinsic characteristics of the computer systems within the domain of interest, and different domains cope with certification demands differently.⁸

During the discussion, workshop participants explored the sorts of difficulties related to certification and context that the committee should consider during the rest of the project. In a sense, some suggested, complex software might just as well be acknowledged as a natural phenomenon: software-intensive systems have become too complex to be completely understood before the fact, and they can behave in surprising ways. Thus the tendency is to work back from the behavior of systems to mechanisms in order to understand the workings of these systems.

Although overt software failures do occur, it was observed that in many instances dependability failures are not the type that could have been mitigated by certification—many failing systems have internal consistency of the sort that certification tests. Instead, failures result from a mismatch between the device requirements defined by the developers and the requirements of the domain in which the devices are used. Indeed, some now believe that virtually all instances of “user error” with computer-based medical devices are examples of this mismatch. These mismatches would not easily be detected by certification processes, such as DO-178B, that pay only minimal attention to analysis of requirements in context. Instead, the risk is that the certification process itself could obscure the fact that the device operating as designed actually lends itself to uses for which it will prove undependable within the larger context.

In general, it was argued that what often first appear to be user or operator errors turn out to be design errors resulting from a lack of recognition of the conditions under which the product is or will be used. A complicating factor is that the introduction of a new system typically changes the

⁸	Many consider certification in avionics (e.g., DO-178B requirements) to be extremely good and wish such stringency could be applied to their own domains. Others within avionics acknowledge that the certification methods there are not perfect and do not solve every problem.

Page 14 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

user’s context as new uses are found, and thus new paths to failure must be anticipated and guarded against. As systems become more useful, they may no longer be backups or add-ons to older systems but instead become the primary mechanisms that users rely on. Moreover, systems and the threats against those systems coevolve. All these observations highlight the importance of clearly articulating the properties of the environment in which the system will be deployed and, in particular, the properties of the environment that the system users depend on.

Reliable software is desirable, of course, but operators and users must recognize that it may be counterproductive to depend too heavily on the reliability of the software. Instead, users should retain skills that might appear to have become superfluous so that they can learn to recognize unreliable behavior and cope with it appropriately. Unfortunately, as systems become more reliable, users grow to depend on them uncritically. In other words, depending on success can be dangerous.⁹

There was some discussion of the role of accountability. Some argued that the lack of accountability was a problem in the development and deployment of dependable software, and that blame should be assigned with accompanying sanctions to those whose actions resulted in failures. Others noted that locating a single point of failure or finding individuals to blame can be difficult. Moreover, assigning blame may actually make matters worse by retarding the flow of information. One might therefore reward information flow instead, even though it may expose weak points and blemishes. It was observed that positive reinforcements are more effective in changing behavior than negative reinforcements, and that decades of human performance literature and research demonstrate this fact. It was also noted that care must be taken when the introduction of a new system is likely to increase workload burdens within an organization. In fact, any new system will impose some workload costs, and if those who bear that cost do not receive direct benefits or increased capabilities from the change, the introduction is likely to founder.¹⁰

The licensing of staff who have key roles in critical projects was discussed and compared with the licensing of professionals in other disciplines. There are, of course, advantages and disadvantages to licensing. However, some participants noted that licensing could be considered one possible way of imposing standards of competence, but urged that such licenses should have to be renewed relatively frequently.

There was some disagreement over the role of regulation and whether further regulation would help or hinder. Most participants believed that there was a positive role for regulation under the proper conditions, and that, indeed, much more regulation of software was required. Who should do the regulating—industrial trade organizations, professional societies, or the government—was not discussed.

Field testing, while desirable, may be prohibitively expensive. Simulation is therefore an attractive alternative and might be acceptable for some kinds of certification. The aviation industry was singled out for having exploited simulation successfully to increase dependability. Simulations

⁹

An example was given of an accounting system that performed quite well over a period of time. Eventually, as individuals with experience before the system was adopted were replaced, the level of understanding of the underlying substantive accounting issues diminished. Users understood the operation of the system, but did not understand what the system was doing deeply enough to manage well when a failure occurred. Another example along these lines was of the difference in caution between a user of a somewhat unreliable operating system who is therefore careful to save documents frequently and a user of a generally reliable operating system who is not so careful. In the latter case, infrequent failure can be more damaging than frequent failure.

¹⁰

Norman, D.A. 1988. The Psychology of Everyday Things. New York, NY: Basic Books.; R.I. Cook and D.D. Woods. 1996. “Adapting to New Technology in the Operating Room.” Human Factors 38(4), 593-613.

Page 15 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

in other software-rich environments might offer something similar to this kind of testing. Such simulation capabilities could be explored to help in the definition of requirements as well as for validation. However, it was also noted that good simulation imposes additional costs and can be very expensive for many kinds of applications, which relates to issues raised in Panel C.

Page 16 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

PANEL E: COST-EFFECTIVENESS OF SOFTWARE ENGINEERING TECHNIQUES

Panelists: Kent Beck, Matthias Felleisen, and Anthony Hall

Moderators: Peter Lee and Jon Pincus

The purpose of the Panel E session was to provide some basis for understanding the cost-effectiveness of software engineering techniques as they relate to dependability and certification. The general question to the panel was whether there is evidence for the cost-effectiveness of various software engineering techniques, either today or looking toward the future. The panelists cited metrics such as frequency of warranty claims and delivered defect density. One panelist argued that the creation of a market for dependable software components would lead to stronger evidence for cost-effectiveness. All three panelists described promising directions that may lead to more cost-effective software engineering techniques for creating dependable software. Overall, several key themes emerged from this panel, among them:

There are interesting substantive overlaps in approaches to software development that seem philosophically opposed on the surface. In particular, agile methods such as “Extreme Programming” seem to share important elements with methods that employ formal notations for early modeling and analysis.
Understanding what is meant by “dependability” is critical; it was observed that, given its ubiquitous use and deployment, software is treated as though generally dependable.
Achieving dependable, certifiable software will require emphasis on process, people, and tools.

Perhaps most surprising to the majority of participants was the strong overlap in two software development approaches described by the panelists: “Extreme Programming” (XP) and “Correctness by Construction” (CbC). While on the surface these two approaches seem philosophically opposed and differ greatly in terms of their provenance, discussion revealed intriguing similarities. Specifically, proponents of each emphasized the importance of taking a risk-based approach, in which the highest-risk (often the least-well-understood) areas of a project are tackled first.

When asked how their approaches would respond to the issue of integration discussed in Panel C, proponents of each approach noted that if integration were clearly the highest-risk issue in a project, the appropriate response would be to address it from the start, and to integrate the subsystems “early and often,” in the hopes of mitigating this risk. Even though much of the functionality would not be available in the earliest integration phases, architectural-level mismatches would be discovered much earlier as a result. It was also noted that having an overall architecture in place, along with an empowered central architecture team, seemed important to address this problem.

Both XP and Correctness by Construction stress the importance of clear requirements. CbC follows a detailed requirements elucidation method (REVEAL¹¹), whereas XP’s “test-first” approach argues for the use of failed test cases as a record of the evolving requirements. Both approaches can be viewed as attempts to make requirements explicit as early as possible (although REVEAL, unlike XP, crucially addresses properties that are not observable at the system interface and is less susceptible to the risk of focusing on low-level issues). The XP community has apparently devised new techniques for improving the observability of nonfunctional properties by making them testable

¹¹

For more about REVEAL, see “Will It Work?,” Jonathan Hammond, Rosamund Rawlings, and Anthony Hall, in Proceedings of RE’01, 5th IEEE International Symposium on Requirements Engineering, August 2001, available at <http://www.praxis-cs.co.uk/pdfs/Will_it_work.pdf> and <http://www.praxis-cs.co.uk/reveal/index.htm>.

Page 17 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

(e.g., by defining explicit trials to establish usability), although some properties—such as the absence of race conditions or buffer overflows—are not easily tested at all.

A number of questions regarding safety-critical (and in particular real-time) software were raised with respect to XP. It was claimed that while XP has not been applied to safety-critical software, the best XP teams record defect rates much lower than those typically found in commercial software, along with substantially higher productivity. It was also noted that while XP had originally been applied in small (8-10 person) teams, it had now scaled to an organization of approximately 250 engineers and thus that the principles could apply (perhaps with some modifications to the practices) to larger, cross-organization projects.

It was reported that a series of projects using the CbC approach have extremely low defect rates: from a 1992 project with a delivered defect rate of 0.75 defects per thousand lines of code (KLOC) to a 2002 project (100,000 lines of code (LOC), with a productivity of 28 LOC/person-day) with a delivered defect rate of 0.04 defects/KLOC. Improvements over that time frame were attributed to the introduction of the REVEAL requirements analysis method, adoption of more powerful static analyses (in particular the use of the Spark Ada subset and its Examiner tool), and a better engineering process, with earlier integration and more frequent builds.

It was pointed out that software is indeed dependable, inasmuch as all the attendees had depended on software to arrive at the workshop. However, given that far too many software development projects fail, or have overruns in cost and time, the reliability of the software production process was decried.

A proposal to reinstate the goal of building a “software market” of interchangeable software components¹² was based on the following rationales:

Market mechanisms provide concrete measures of utility and value; such measures are currently missing in software engineering, and thus comparisons are difficult to make.
Improvements in process, education, and technology often come as a result of the competition afforded by markets.
A market of interchangeable components might provide a reliable means of choosing specific software products for specific tasks.

The importance of process, people, and tools (in various orders of importance) was emphasized, not surprisingly. Also discussed at some length was the importance of continuing education for programmers, by analogy to health care workers (specifically, surgeons). A substantial amount of discussion related to the role of language, and not only at the level of programming. All the panelists agreed that the choice of language or notation is important, as it affects the thought processes of the people involved in the project. There was also a rough consensus that at least local properties should be kept “with the code” via strong interface specifications. Different languages might be appropriate for different purposes, again casting the choices in terms of risks.

A number of other topics were also raised and discussed briefly. Panelists emphasized the importance of approaching software from an engineering perspective. Just as chemical engineers and mechanical engineers have process differences due to the different materials they work with, software engineers with their very different “materials” likely have different processes as well.

¹²	It was observed that this notion of a market of components has often been discussed but that there is little historical support for such an idea and considerable skepticism regarding its present feasibility.

Page 18 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

PANEL F: CASE STUDY: ELECTRONIC VOTING

Presenters: David Dill, Douglas Jones, Avi Rubin, and Ted Selker

Moderators: Reed Gardner and Daniel Jackson

Panel F addressed the controversial dilemmas posed by current electronic voting systems.¹³ Most of the discussion was focused on the flaws of these systems and the process by which they are produced, although it was noted that computing-related issues may be overemphasized, distracting attention from more serious flaws in the wider electoral system. The major themes of discussion were as follows:

Structural flaws in the voting system go beyond the absence of voter-verifiable paper trails.
The lack of detailed risk analysis, coupled with a lack of openness in the voting system certification process, poses serious challenges to achieving a dependable voting infrastructure.
The current certification process does not seem to have resulted in secure or dependable electronic voting systems.

While much attention has been focused on whether the use of paper (either in the form of paper ballots or a voter-verifiable paper trail) is an effective corrective for problems with electronic voting systems, it was noted that voter verifiability is not the only challenge today with the voting infrastructure, and perhaps not even the most important problem. Votes go “adrift” everywhere. In the 2000 U.S. presidential election, for example, 2 percent of votes were lost because of problems in registration databases. The panelists noted that although attention has been focused mostly on the voting machines themselves, voting is a complex system in a complex social setting, and any weak link can compromise the entire process. While there are problems with the current paper voting system, and the electronic voting system has the potential to prevent some of these errors, it also has the potential to introduce new errors. The panelists focused on errors related to software flaws, inadequate security, distributed database problems, communications inadequacies, limited auditing capabilities, and poor user interfaces.

There has apparently been no serious attempt to perform a detailed risk analysis of electronic voting systems—a standard procedure in the development of safety-critical systems. In fact, it is not clear that those involved in electronic voting systems—from vendors to election officials and certifiers—even appreciate these systems’ critical nature, perhaps because the purely algorithmic aspect of voting (count the votes and the candidate or initiative with the most votes wins) is so simple. Most of the panelists and other workshop participants were confident that such a risk analysis could be conducted, and that it might lead to a simplification of the voting process and the identification of a small trusted computing base within the larger system that would correctly be the focus of attention.

The lack of openness in the voting system certification process is a serious problem. The certification of electronic voting systems is currently conducted by companies that report to local election officials in states and counties. Each jurisdiction makes its own decisions about which companies to use and interprets the results in its own way. Usually, the choice of certifier, the inputs to the certification process, the certification report, and even the criteria used are not made known to the public. A participant reported that, in some cases, certifiers have not been willing to talk to outside experts. As a result, confidence in the system has eroded and insights that outsiders might bring have not been taken advantage of. Given these concerns, how can the certification process itself come to be trusted? (This is indeed a problem that certification will face in any domain.)

¹³	For the purposes of this discussion “electronic voting” is distinct from Internet voting.

Page 19 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

Even for a given vendor’s system, performance varies widely because of differences in ballot design and user interface. Even small differences in a user interface or ballot design can have dramatic effects on the dependability of the system as a whole. In the United States, ballots tend to be very complicated; one panelist noted that a typical California voter makes more choices in a single election than a typical British voter does in a lifetime. Ballots differ widely across counties and may present dozens of choices. As a consequence, the layout of the ballot is complex and highly variable. The user interface of an electronic voting machine may give some propositions or candidates an advantage—either by design or by accident. The issues here have not been well studied.

An end-to-end check, which would ascertain that each person’s vote was correctly acquired, recorded, transmitted, and counted, might protect against software flaws and malicious attacks. But designing such an audit within the constraints of ballot secrecy requirements is a difficult problem. The idea of using DREs (Direct Recording Electronic systems) simply as intelligent printers for generating completed ballots that might then be optically scanned—a voter-verifiable paper trail—was suggested, although one panelist argued that the attention paid to this issue has diverted resources from more important issues. It was also noted that it may be infeasible to expect voters to check a paper audit consisting of dozens of choices, and that audio feedback, for example, might be more useful to ensure that entry errors are not made.

A further risk is the lack of a process to ensure that the software used in an election is the software that was certified. Electronic voting systems, highly dependent on software, have the advantage of flexibility; unfortunately, the same flexibility allows late changes to the software that have not been certified, and participants noted that there have been reported cases in which voting machine software has been modified at the last minute, long after certification has been completed.

Lack of expertise with computers and software among local election officials exacerbates this problem: officials might be able to detect tampering with a traditional paper system, but they are not necessarily qualified to understand the issues involved in the loading and updating of software. Lack of observability is also a serious problem. Critical systems sometimes suffer because their failures may not be immediately observable, but in the worst cases catastrophic failures are rarely invisible: a medical device that kills a patient, for example, will likely eventually be discovered. But a voting system could fail repeatedly, corrupting the results of large elections, without discovery. The secrecy requirement of voting is one of the root causes of this problem. Mechanisms for detecting electronic voting system failures must be developed and tested, and perhaps local policies and processes will need to be changed to allow such failure detection and prevention.

Panelists agreed that the certification process for electronic voting systems has failed demonstrably. Despite the procedures set up by local election officials and the work of third-party certifiers, voting systems have been found by outside experts to contain egregious flaws and to be susceptible to easy attacks. Some of the reasons for this failure include:

The closed nature of the process and the skewing of incentives. Local election officials are often involved in the procurement of systems and have little incentive to report problems with such procurements. Political incumbents are often involved in the choice of vendors and certifiers, compromising the independence of the selection.
The fact that purchasers are typically ignorant about complex computer systems, and lack the ability to assess risks and evaluate certification efforts.
A mismatch in expectations. While certifiers may recognize how incomplete and limited their analyses are, their reports are often misinterpreted as implying that the system is unassailable.

In addition, there are core technical problems with current electronic voting systems, among them the inherent complexity of the software, the nature of collaborations between vendors and

Page 20 Cite

Suggested Citation:"2 Summary of Panel Sessions and Presentations." National Research Council. 2004. Summary of a Workshop on Software Certification and Dependability. Washington, DC: The National Academies Press. doi: 10.17226/11133.

×

purchasers, and the infeasibility of meeting certain requirements. The software problem is a complex one, and it is not clear that a specific and evidently correct solution is possible. The end users—voters and local election officials—are not likely to be able to understand the code themselves or to appreciate the implications of its complexity for errors. Inappropriate collaboration between local politicians and the vendor, along with the ability to control the setup of the ballot, may be problematic not only because of potential malfeasance but also because neither the vendor nor the politicians are likely to be experts in designing usable ballots. Requirement specification may also be problematic if the requirements mandated in legislation are technically infeasible or unspecific.

It was observed that electronic voting systems pose a fundamentally harder challenge than many other safety-oriented critical systems because of the high risk of motivated, malicious attack. Because voting is so foundational to our democracy and because there are strong incentives for rogue states, terrorists, political parties, special-interest groups, and even individuals to influence election results, the threat of attack on such systems is highly likely. One panelist estimated that in the current system bribing only a handful of people could allow serious compromise. At the moment, large-scale attacks on medical and avionics system software are relatively minimal; there seems to be little motivation for such attacks because these types of systems tend to be very distributed and physically inaccessible. Accordingly, certification in these domains has evolved without much attention to the kinds of adversaries that voting systems might face,¹⁴ although there is increasing concern that such systems may themselves become targets of attack.

The difficulty in identifying the user of the system creates additional challenges for building dependable and certifiable voting systems. In most system development environments, there is a user who can evaluate the delivered system for fitness of purpose, and the same party that evaluates the system has the primary vested interest in its quality. For voting systems, there is no single user. Local election officials play a key role in acquiring and managing these systems, but it is arguably the voter who is the true user. Unfortunately, the voter has little influence on the process of acquisition and certification, and cannot easily even assess how likely it is that his or her vote was recorded correctly.

¹⁴	This situation highlights the difference between security and safety. While each is needed to a greater or lesser degree in all systems, the techniques and lessons learned in an effort to achieve one are not necessarily applicable in achieving the other.