The preceding chapter outlined a comprehensive approach to the development of certifiably dependable software. The proposed approach has implications not only for how software is produced and evaluated but also for government policy, legislation, and regulation; education; and research. Each of these areas warrants in-depth studies of its own, and the committee recognizes that policy prescriptions in particular—especially in light of the limited data and evidence available in the arena of certifiably dependable software—can have complex and unpredictable ramifications. The committee has therefore chosen to refrain from making concrete and prescriptive recommendations aimed at particular agencies or specific domains. Nevertheless, it seemed useful to outline some of the relevant issues and note areas for further investigation and consideration.
Dependable systems need dependable components, tools, and software companies, so it is important that customers and users be able to make an informed judgment when choosing suppliers and products. This only becomes possible when the criteria and evidence underlying claims of dependability are transparent.
Economists have established that if consumers cannot reliably observe quality before they buy, then sellers may get little economic benefit from providing higher quality than their competitors, and overall quality can
decline.1 Because their reputation will affect future sales, sellers strive to maintain some minimum level of quality. If consumers rely heavily on such branding, though, it becomes more difficult for new firms to enter the market. In this case, the software industry could lose out on quality or other improvements because new and innovative firms had limited means of proving their quality. Information asymmetries of this type can be mitigated if dependability claims are explicit and backed by evidence, as long as the evidence is available for inspection by potential buyers.
Such transparency, in which those claiming dependability for their software make available the details of their claims, criteria, and evidence, is thus essential for providing the correct market conditions under which informed choices can be made and the more dependable suppliers can prosper.
To assess the credibility of such details effectively, an evaluator should be able to calibrate not only the technical claims and evidence but also the organization that produced them, because the integrity of the evidence chain is vital and cannot easily be assessed without supporting data. This suggests that data of a more general nature should be made available, including the qualifications of the personnel involved in the development; the track record of the organization in providing dependable software, which might include, for example, defect rates on previous projects; and the process by which the software was developed and the dependability argument constructed, which might include process manuals and metrics, internal standards documents, applicable test suites and results, and tools used.
A company is likely to be reluctant to reveal data that might be of benefit to a competitor or that might tarnish the company’s reputation. It is also likely that demands to publish defect rates would result in careful redefinitions of what constitutes a defect. These concerns, however, should not deter users from demanding such information, but the demands should be reasonable, well-defined, and commensurate with the dependability claimed and the consequences of failure. The willingness of a supplier to provide such data, and the clarity and integrity of the data that it provides, will be a strong indication of its attitude toward dependability, since a supplier who truly understands the role of evidence in establishing dependability will be eager to provide such evidence, and a supplier who does not understand the need for evidence is unlikely to understand all the other attributes of dependability.
One would not expect the users of a commodity operating system for standard office purposes to press for such information, although it would
be reasonable to expect lists of all known defects, details of the rate at which new defects were reported, and the rate of repair. In contrast, however, the public might reasonably demand that very detailed information about the construction and validation of an electronic voting system be made publicly available. Similarly, patients who receive treatment from a potentially lethal medical device should have access to information about its evaluation just as they have access to information about the side effects and risks of medications.
It should be noted that providing direct access to evidence is not the only way that a supplier can signal quality. More widespread use of warranties, for example, would help consumers select the more dependable products and suppliers, so long as the warranties are based on explicit claims about the properties of the software and are not simply a marketing gimmick. Industry practice with regard to warranties on commercial software varies widely, with some software developers continuing to disclaim all responsibility for the quality of their products and some routinely warranting turnkey systems against all defects.
At the same time, consumer confidence is not necessarily a good measure of quality. Research into the effects of report cards in the health industry has found mixed results. In one study, consumers were found to base their choice of HMO more on the subjective ratings reported on the cards (which are obtained from consumers themselves and are influenced by factors such as the comfort of waiting rooms and availability of parking) than on objective data (such as mammography rates and other data indicating conformance with best practices).2 Similar phenomena seem to apply to consumer choice of software, which may be guided more by superficial convenience factors than by inherent quality. This is not to deny consumers the right to weigh factors as they please in their selection of products, of course, but it does mean that popularity with consumers should not be taken as prima facie evidence of quality.
ACCOUNTABILITY AND LIABILITY
Where there is a need to deploy certifiably dependable software, it should always be explicit who is accountable, professionally and legally, for any failure to achieve the declared dependability. One benefit of making dependability claims explicit is that accountability becomes possible; without explicit claims, there cannot even be a clear determination of what constitutes failure. Such accountability can be made explicit in the
purchase contract, as part of the certification of the software, as part of a professional licensing scheme, or in other ways. However, these are not true alternatives to one another because they interact—for example, a certification scheme might require the use of licensed staff as lead developers or as certifiers. No single solution will meet all the circumstances in which certifiably dependable software will be deployed, and accountability regimes should therefore be tailored to suit particular circumstances.
At present, it is common for software developers to disclaim, so far as possible, liability for defects in their products to a greater extent than customers and society expect from manufacturers in other industries. Clearly, no software should be considered dependable if it is supplied with a disclaimer that releases the manufacturer from providing a warranty or other remedies for software that fails to meet its dependability claims. Determining appropriate remedies, however, was beyond the scope of this study and would have required careful analysis of benefits and costs, taking into account not only the legal issues but also the state of software engineering, the various submarkets for software, economic impact, and the effect on innovation.
To establish that software is dependable involves inspection and analysis of the dependability claim and the evidence that is offered in its support. Where the customers of the software are not able to carry out that work themselves (for lack of time or expertise) they will need to involve a third party whose judgment they can rely on to be independent of pressures from the vendor or other parties. Evaluating the dependability case is where certification regimes come into play.
Such independence must be demonstrated if third parties are to be successfully used in this role. Third-party assessors have been successful in other fields—the licensed engineers who carry out certificate-of-airworthiness inspections on aircraft, for example, and the “authorized bodies” who perform inspections in the European rail industry—and there is no fundamental reason that such assessment should not work in the software industry too.
Certification can take many forms, from self-certification to independent third-party certification by a licensed certification authority. No single certification regime is suitable for all circumstances, so a suitable scheme should be chosen by each customer and vendor to suit the circumstances of the particular requirement for certifiably dependable software. Industry groups and professional societies should consider developing model certification schemes for their industries, taking account of the detailed recommendations in this report. Any certification regime focus-
ing on dependability should make use of a dependability case, as has been described throughout this report.
Certification should always explicitly allocate accountability for the failure of the software to meet the claimed dependability requirements. In general, such accountability should lie with the person making the claim for dependability—perhaps the software manufacturer, the system manufacturer (especially where COTS software has been incorporated in a system), or the certifier.
EVIDENCE AND OPENNESS
Evidence is the central theme of this report. In the arena of particular software products and systems, the committee has argued that confidence in the dependability of a system must rest on concrete evidence. And in the broader arena of technology advances, including finding better approaches and methodologies to developing software as well as developing innovative new tools, it has argued that evidence supporting or contradicting particular approaches is an essential enabler of progress. Determining whether to build and field a software system that could offer great benefits but also pose a potentially catastrophic risk calls for a plausible and transparent cost-benefit analysis that explicitly and carefully considers the evidence.
In both arenas—individual software products and technology advances—there is currently a dearth of evidence, which seriously hampered the committee’s work, making it hard to resolve debate or reach an informed consensus on some issues. Obtaining and recording better evidence is crucial. A key obstacle is a lack of transparency and the inability to look into the system under consideration and see how it was developed. In some cases, evidence is not available. Many software developers, for example, are not withholding data but have simply not seriously considered using the evidence they have for evaluating the dependability of their product. In other cases, however, evidence exists but cannot be used effectively because no one who sees it has sufficient expertise.
Some software producers might be driven to hide evidence that could damage perceptions of their product. But others choose not to disclose evidence because they are reasonably concerned about revealing proprietary information that would aid competitors or because they have no incentive to pay the costs of organizing and disseminating the data. The committee is loath, therefore, to propose regulations or standards that might compel software producers to reveal proprietary information.
However, because such evidence would be valuable for the software industry and its consumers, it is important to rescue it from the shadows and make it more available. The committee encourages consumers to
demand better information about the dependability of software products and to be suspicious of any dependability claims that are not allowed to be evaluated by an independent third party.
Likewise, the committee encourages those in government who procure and field critical systems to be skeptical of manufacturers’ claims and to recognize that public scrutiny can be a good thing. In some domains, secrecy will remain important; it would not be sensible, for example, to insist that the designs and dependability cases for defense systems be made public. Secrecy, however, is often overrated, and much of the research community has come to believe that secrecy prevents it from examining the mechanism in question, robbing society of the peer review that would otherwise take place. Furthermore, the confidence of the public might be seriously undermined if important information is withheld by government officials that might bear on the decision to field a system. Electronic voting is a prime example of this. Despite accusations of serious failures and vulnerabilities in voting software, its manufacturers, along with the state officials who award the contracts and are responsible for assessing the dependability of the software, have in some cases been reluctant to give out information that would allow independent experts to make their own judgments and may have forfeited society’s chance to have better software and may even have damaged the credibility of the electoral process itself.3
Because the committee has argued that the same broad principles should apply to a variety of systems in different application domains, it has not made recommendations specific to any particular area. Security, however, demands special consideration, because although security concerns are greater for certain kinds of systems, almost all systems are vulnerable to malicious attack to some degree. Effort invested in building a dependability case for a system is much less useful if there are security vulnerabilities that bring into question the most basic assumptions made about the behavior of components and their independence. In short, security vulnerabilities can undermine the entire dependability case and therefore need to be addressed as an integral part of the case.
Most software systems are networked and therefore open to attack;
See, for example, National Research Council (NRC), 2006, “Letter report on electronic voting,” The National Academies Press, Washington, D.C.; and NRC, 2005, Asking the Right Questions About Electronic Voting, The National Academies Press, Washington, D.C. Available online at <http://books.nap.edu/catalog.php?record_id=11704> and <http://books.nap.edu/catalog.php?record_id=11449>, respectively.
these clearly need a security audit. For systems that have been isolated, an audit is also likely to be essential, because the inconvenience of isolation is usually a response to the perceived risk of malicious attack.
The security of a product or system (and, consequently, certification thereof) involves two somewhat distinct facets of the product or system: (1) the presence of security features such as access controls that allow the owner of the product or system to define and enforce security policy and (2) the ability of the product or system to resist hostile attack.4 However, it should also be noted that the mere presence of security features is not sufficient in and of itself. Indeed, given the increasing complexity of systems and security features, the usability and complexity of security configuration is a significant concern as well. It is important that it be likely, not just possible, that a system’s administrators will configure its security features correctly. Due effort is needed to evaluate and show that feasible and expected configurations do not result in obvious vulnerabilities and to ensure that it is clear to those configuring the system what the appropriate configurations are.
The presence and correctness of security features can be certified by measures similar to those used to certify that other functional requirements are present and correct, and such certification is the domain of today’s Common Criteria (CC). However, certification of the ability to resist attack needs to begin by considering the kinds of attacks that might be directed at the product or system (sometimes referred to as a threat analysis) and then proceeding to review the measures that the developer took to prevent attacks from being successful. This review examines not only the developer’s process but also the effectiveness of the specific techniques that were applied to identify and remove vulnerabilities, and it rests on evidence that the developer in fact applied those techniques thoroughly and effectively.
While the CC assess security features, a new paradigm is needed to provide the owners of products and systems with a meaningful certification of resistance to attack. The approach to dependable software that this report proposes is germane to the development of such a certification paradigm. In particular, attention must be paid to articulating and evaluating assumptions about the environment in which the system operates and in which malicious attackers reside. The analysis is harder for security than for other properties, because the interface between the system and the environment is not easily described. This is true in general, of course. A system that controls a motor, for example, might need to account not
For a brief overview of cybersecurity issues generally, see NRC, 2002, Cybersecurity Today and Tomorrow: Pay Now or Pay Later, The National Academies Press, Washington, D.C. Available online at <http://www.nap.edu/catalog.php?record_id=10274>.
only for the electrical load but also for the heating effect of the motor on nearby sensors. In the security realm, however, the concerns are central, since attackers aim to exploit hidden aspects of the interface that a security audit might have neglected. For example, attacks on smartcards, have been devised that rely on monitoring fluctuations in the electrical load that the device presents.5 This means that security analysts should always be attentive to the risks of new kinds of attacks, and that security cases should be revisited as new attacks are discovered.
A REPOSITORY OF SOFTWARE DEPENDABILITY DATA
Transparency and openness alone are not enough, however. Few people have the time and expertise to carefully examine and understand arcane data. Developing a substantial repository of credible evidence will require a concerted effort to record, analyze and organize data. Such an effort would probably involve at least two distinct components, both aimed at involving software engineering experts more directly in accident analysis and reporting.
First, software experts should be actively involved in accident analysis. In many accidents software is either a contributing or a central factor, yet it is common for review panels not to examine the software at a level of detail commensurate with its role. Experts in other fields tend to minimize the role of software and underestimate the threats it poses. It is common, for example, to blame users for taking inappropriate actions despite egregious flaws in the design of the user interface.6
Second, reports of failures and accidents should, whenever possible, be accompanied by the software artifacts themselves so that experts can evaluate a report on the basis of the same evidence that was made available to the report’s authors. Concerns about proprietary material and the risk of exposing security vulnerabilities in existing systems should of course be taken into account, but the ease of publishing large artifacts in the era of the Web and the value of making the information widely avail-
See, for example, O. Kommerling and M. Kuhn, 1999, “Design principles for tamper-resistant smartcard processors,” Proceedings of the USENIX Workshop on Smartcard Technology (Smartcard ’99), Chicago, Ill., May 10-11, USENIX Association, pp. 9-20. Available online at <http://www.cl.cam.ac.uk/~mgk25/sc99-tamper.pdf> for a discussion of various smartcard vulnerabilities.
The Panama radiotherapy accidents are a good example of this phenomenon. See IAEA, 2001, “Investigation of an accidental exposure of radiotherapy patients in Panama: Report of a team of experts, 26 May-1 June 2001,” IAEA, Vienna, Austria. Available online at <http://www-pub.iaea.org/MTCD/publications/PDF/Pub1114_scr.pdf>. Also, see M.H. Lützhöft and S.W.A. Dekker, 2002, “On your watch: Automation on the bridge,” Journal of Navigation 55(1):83-96.
able should make disclosure the default position and place the burden of proof on those who would argue against it.
How exactly these goals should be achieved in terms of policy prescriptions is beyond the purview of this report. A centralized approach, in which government agencies (the FDA, NTSB, FAA, and so on) maintain public databases and supervise the collection and dissemination of data, might make certain aspects of this process, such as cross-comparisons, easier. On the other hand, there is value in decentralized approaches, in which software specialists form local teams that oversee software in particular domains and locations, such as the software oversight committees proposed by Gardner and Miller for medical software systems.7
In many high school and indeed some college-level programming courses, students are introduced to programming as a mechanistic activity, in which programs are developed by trial and error. Such experimentation and exploration can be healthy; as in other fields of design and engineering, exploring new ideas is essential, especially for novices. However, as argued elsewhere in this report, the development of dependable software should ultimately be seen as an engineering activity—as argued elsewhere in this report. Thus a curricular emphasis on finding the essence of a program and solving it convincingly is preferable to mastering the accidental intricacies of particular software systems. Moreover, the absence of exemplars and overexposure to software that is overly complicated or otherwise poorly designed can make it harder to teach students to appreciate the important qualities of good design, such as clarity, simplicity, and fitness to purpose. Introducing the notion of dependability in educational contexts would require (1) a recognition of the real-world factors that lead to complexity and (2) discussion of explicit examples of clarity and simplicity in the design of large systems and the trade-offs involved in their design.
In high school computer science education, giving students a foundation in the ideas of dependability would require greater emphasis on programming as a design activity, on the qualities of a good program, and on the process of constructing programs and reasoning about them. The intricacies of the programming language or platform low-level execution details would receive less emphasis. Programming with an eye toward dependability and a rudimentary dependability case would be used to
help develop a student’s ability to crystallize ideas and make them precise and to structure and dissect arguments.
Decisions on the curriculum are often motivated by the desire to make students “computer literate.” The goal is laudable, but it is important that such literacy not be construed merely as familiarity with the details of today’s software products. The ability to operate a computer and use standard desktop applications with basic competency is essential, but it is also important for students to have an understanding of how the computing infrastructure as a whole works and why it sometimes fails.8 Computer literacy should not be confused with computer science and software engineering, and it is important that students understand the difference. In addition, mathematics is important for the education of software engineers, especially combinatorics and discrete mathematics, including the theory of sets, relations, and graphs.
At the university level, an emphasis on dependability would mean that the software and computer science curriculum should address more explicitly the topics that are the foundation for dependable software. Students need to have a broader understanding of the role of software and computers in larger systems and need to be familiar with the basic principles of systems engineering. Topics that support dependability include a basic introduction to formal methods, with an emphasis on system modeling rather than proofs of correctness, along with usability and human factors. Security and dependability are usually treated as specialized topics, but they should be integrated into the curriculum more fully and encountered by students repeatedly, especially when learning how to program.
The mathematical background of students studying computer science and software engineering would need to be expanded to include not only discrete mathematics (set theory and logic) but also probability and statistics, whose importance in many fields of computing is growing and which are particularly important for understanding dependability issues. Because the mathematics courses offered to computer science students are often designed with mathematicians in mind rather than engineers, they tend to focus on meta results and proof. Most computer science students, especially those interested in software, would benefit more from mathematics courses that focus on using mathematical constructs to model and reason about systems.
In software projects, one way to encourage attention to dependability concerns would be to require students to build programs that respond gracefully to unanticipated input as a way of introducing them to the most fundamental principles of building secure software. More generally, students should be encouraged not merely to achieve a running program that passes a suite of tests but also to develop a deeper understanding of why the program works and to assess their confidence in its dependability by developing minidependability cases of their own based on an honest appraisal of their own abilities, on the strength of their argument that it works, and on the significance and likelihood of adverse events in the environment.
Although the committee believes that the approach outlined in this report might substantially improve the dependability of software, it recognizes that these measures alone cannot overcome the ever-growing demands for software with more complex functionality, operating in more invasive and critical contexts. Major technological advances are therefore essential for the future of the industry. While such advances might be produced by the computer industry alone, its history to date (and the dramatic success of federal investment, for example, in networking) suggests that advances will come more quickly and at lower cost if significant investments are made in fundamental research. In the United States, the High Confidence Software and Systems Coordinating Group (HCSS CG) of the National Coordination Office for Networking and Information Technology Research and Development (NITRD) coordinates many research activities in areas relevant to this report, focusing on
scientific foundations and technologies for innovative systems design, systems and embedded application software, and assurance and verification to enable the routine production of reliable, robust, safe, scalable, secure, stable, and certifiably dependable IT-centric physical and engineered systems comprising new classes of advanced services and applications. These systems, often embedded in larger physical and IT systems, are essential for the operation of the country’s critical societal infrastructures, acceleration of U.S. capability in industrial competitiveness, and optimization of citizens’ quality of life.9
The importance of software dependability suggests that funding could be focused on areas that might lead to more dependable software.
For more information, see the NITRD HCSS CG home page online at <http://www.nitrd.gov/subcommittee/hcss.html>.
Some areas that seem to merit attention and follow from the overarching recommendations and approach of this study are covered briefly below. They should not be construed as exclusive but as providing an indication of what sorts of research questions the approach raises:
Testing as evidence. Testing is currently the most widely used technique for finding bugs in code, and when it is performed systematically and extensively, it can be an important element of a dependability case. As noted earlier in the report, however, it is hard to determine what level of dependability is assured when a system passes a given test suite. Clearly, an exhaustive test that covers every state and history that could possibly occur in the field would be tantamount to proof (and perhaps better). At the other end of the spectrum, passing a few dozen ad hoc tests provides little information about the flaws that might remain. The former approach is almost never feasible and the latter is insufficient. The gray area in the middle merits consideration. Can concrete dependability claims be based on limited testing? Can the absence of certain classes of error be assured by the successful execution of certain test cases? Could stronger claims be based on testing if novel forms of coverage (such as execution of all possible traces for a limited heap size or number of context switches) are used? Might testing with respect to a known operational profile be substantiated by online monitoring to ensure that the profile used for testing remains an accurate representation of actual operation? Although considerable literature on testing exists, there is an opportunity for further research to be undertaken focused specifically on methods that create evidence that a system has some explicit dependability properties to a high degree of confidence.
Checking code against domain-specific properties. Recent years have seen many advances in techniques for automatic code checking, and there is renewed interest in program verification (witness the recent proposal of a Grand Challenge in this area10). These techniques will be essential to the construction of dependability cases, especially if they are capable of handling domain-specific properties rather than just local properties of the code that cannot be assembled into a systemwide argument for dependability.
Strong languages and tools for independence arguments. As discussed above, the cost of constructing a case for dependability with respect to a particular critical property would be reduced by restricting the code-level argument to a small proportion of the modules. Using unsafe languages compromises any modularity that would otherwise make such an inde-
pendence argument plausible. For example, as noted previously, in a program written in a language such as C, an out-of-bounds array access can overwrite data structures that are not accessible by name, so that one cannot rely on the use of names to determine how one module might interact with another. Research is needed to understand whether using safe languages or other tools could justify independence and help structure dependability arguments, and how independence arguments might be made when there are good reasons to use an unsafe language.
Composing component dependability cases. Complex software components are seldom furnished with the information needed to support dependability arguments for the systems that use them. For use within a larger argument, the details of the dependability case of a component need not be known. Until recently, there has been little demand for components to be delivered with the claims, arguments, and evidence needed to support the dependability case for a system that uses the component. At lower levels of criticality, and in accidental systems, explicit dependability cases have seldom been constructed, so there has been no perceived need for component-level cases. At the other extreme, the dependability cases for systems with highly critical assurance goals (such as airplanes) have focused on the details of their components. In addition, there have been few regulatory mechanisms applicable to such systems to support the use of prequalified critical components that would allow the dependability case for the larger system to use the applicable cases for its components without inquiring into all the details of the components themselves. With greater reuse of components, and a concomitant awareness of the risks involved (especially of using commodity operating systems in critical settings), component-level assurance will become an essential activity throughout the industry, and it will be necessary to find ways to compose the dependability arguments of components into an argument for the system as a whole. The research challenges involve not only investigating how this might be done, but also how to account for, and mitigate, varying levels of confidence in the component arguments.
Modeling and reasoning about environments. As explained earlier in this report, the dependability of a system usually rests on assumptions about the behavior of operators and devices in the environment of the system and, more broadly, on the human organization in which the system is deployed. The dependability case should therefore involve reasoning about interactions between the system and its environment. The necessary formal foundations for such reasoning are perhaps already available, since an operator or physical device can be modeled along with the system, for example, as a state machine. It is not clear, however, how to model the environment and structure environmental assumptions; how to account
for human behavior or larger organizational effects; how to handle normal and malicious users; or how to express crucial properties.
Reasoning about fail-stop systems. The critical dependability properties of most critical systems will take the form “X should never happen, but if it does, then Y must happen.” For example, the essential property of a radiotherapy machine is that it not overdose the patient. Yet some amount of overdose occurs in many systems, and any overdose that occurs must be reported. Similarly, any fail-stop system is built in the hope that certain failures will never occur but is designed to fail in a safe way should they occur. It therefore seems likely that multiple dependability cases are needed, at different levels of assurance, each making different assumptions about which adverse events in the environment and failures in the system itself might occur. The structuring of these cases and their relationship to one another is an important topic of investigation.
Making stronger arguments from weaker ones. A chain can be stronger than even its strongest link if the links are joined in parallel rather than in series. Similarly, weaker arguments can be combined to form a single stronger argument. A dependability case will typically involve evidence of different sorts, each contributing some degree of confidence to the overall dependability claim. It would be valuable to investigate such combinations, to determine what additional credibility each argument brings, and under what conditions of independence such credibility can be maximized.