Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 86
Critical Code: Software Producibility for Defense 4 Adopt a Strategic Approach to Software Assurance SOFTWARE ASSURANCE AND EVIDENCE One of the great challenges for both defense and civilian systems is software quality assurance. Software assurance encompasses reliability, security, robustness, safety, and other quality-related attributes. Diverse studies suggest that overall software assurance costs account for 30 to 50 percent of total project costs for most software projects.1 Despite this cost, current approaches to software assurance, primarily testing and inspection, are inadequate to provide the levels of assurance required for many categories of both routine and critical systems.2 In major defense systems, the assurance process is heavily complicated by the arm’s-length relationship that exists between a contractor development team and government stakeholders. This relation- 1 In “Software Debugging, Testing, and Verification,” IBM Systems Journal (41)1, 2002, B. Hailpern and P. Santhanam say, “In a typical commercial development organization, the cost of providing this assurance via appropriate debugging, testing, and verification activities can easily range from 50 to 75 percent of the total development cost.” In Estimating Software Costs (McGraw-Hill, 1998), Capers Jones provides a table relating percentage of defects removed to percentage of development effort devoted to testing, with data points that include 90 to 39 percent, 96 to 48 percent, and 99.9 to 58 percent. In Software Cost Estimation with COCOMO II (Prentice Hall, 2000), Barry W. Boehm, Chris Abts, A. Winsor Brown, Sunita Chulani, Bradford K. Clark, Ellis Horowitz, Ray Madachy, Donald Reifer, and Bert Steece indicate that the cost of test planning and running tests is typically 20 to 30 percent plus rework due to defects discovered. In Balancing Agility and Discipline (Addison-Wesley, 2004), Barry Boehm and Richard Turner provide an analysis of the COCOMO II Architecture and Risk Resolution scale factor, indicating that the increase in rework due to poor architecture and risk resolution is roughly 18 percent for typical 10-KSLOC (KSLOC stands for thousand software lines of code) projects and roughly 91 percent for typical 10,000-KSLOC projects. (COCOMO II, or constructive cost model II, is a software cost, effort, and schedule estimation model.) This analysis suggests that improvements are needed in up-front areas as well as in testing and supporting the importance of architecture research, especially for ultra-large systems. 2 The challenges relating to assurance were highlighted by several briefers to the committee. In addition, this issue is a core concern in the Defense Science Board (DSB), September 2007, Report of the Defense Science Board Task Force on Mission Impact of Foreign Influence on DoD Software, Washington, DC: Office of the Undersecretary of Defense for Acquisition, Technology, and Logistics, at pp. 30-38. Available online at http://stinet.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA473661. The 2007 NRC report Software for Dependable Systems also addressed the issue of testing and noted, “Testing … will not in general suffice, because even the largest test suites typically used will not exercise enough paths to provide evidence that the software is correct nor will it have sufficient statistical significance for the levels of confidence usually desired” (p. 13). See NRC, Daniel Jackson, Martyn Thomas, and Lynette I. Millett, eds. 2007, Software for Dependable Systems, National Academies Press, Washington, DC. Available online at http://www.nap.edu/catalog.php?record_id=11923. Last accessed August 20, 2010.
OCR for page 87
Critical Code: Software Producibility for Defense ship—in which sometimes even minor changes to up-front commitments may necessitate amendments to contracts and adjustments in costing—can create barriers to the effective and timely sharing of information that can assist the customer in efficiently reaching accurate assurance judgments. Additionally, it can be difficult to create incentives for the appropriate use of preventive measures such as those referenced in this chapter. In this chapter the committee first considers the trends related to the challenges of software assurance. It then offers a concise conceptual framework for certain software assurance issues. Finally, it identifies significant technical opportunities and potential future challenges to improving our ability to provide assurance. (Some of these are elaborated in Chapter 5.) Failures in software assurance can be of particularly high consequence for defense systems because of their roles in protecting human lives, in warfighting, in safeguarding national assets, and in other pivotal roles. The probability of failure can also be high, due to the frequent combination of scale, innovative character, and diversity of sourcing in defense systems. Unless exceptional attention is devoted to assurance, a high level of risk derives from this combination of high consequence and high likelihood. Assurance considerations also relate to progress tracking, as discussed in Chapter 2—assessment of readiness for operational evaluation and release is based not just on progress in building a system, but also on progress in achieving developmental assurance. Additionally, the technologies and practices used to achieve assurance may also contribute useful metrics to guide process decision making. Assurance Is a Judgment Software assurance is a human judgment of fitness for use. In practice, assurance judgments are based on application of a broad range of techniques that include both preventive and evaluative methods and that are applied throughout a software engineering process. Indeed, for modern systems, and not just critical systems, the design of a software process is driven not only by issues related to engineering risk and uncertainty, but also, in a fundamental way, by quality considerations.3 These, in turn, are driven by systems risks—hazards—as described in Chapter 2 and also in Box 4.1 (cybersecurity). An important reality of defense software assurance is the need to achieve safety—that is, in war, there are individual engagements where lives are at stake and where software is the deciding factor in the outcome. In many life and death situations, optimum performance may not be the proper overriding assurance criterion, but rather the “minimization of maximum regret.” This is exacerbated by the fact that, while a full-scale operational test of many capabilities may not be feasible, assurance must nonetheless be achieved. This applies, for example, to certain systems that support strategic defense and disaster mitigation. The committee notes, however, that there are great benefits in architecting systems and structuring requirements such that many capabilities of systems that would otherwise be rarely used only for “emergencies” are also used in an ongoing mode for more routine operations. This creates benefits from operational feedback and user familiarity. It also permits iterative development and deployment, such as is familiar to users of many evolving commercial online services. Another reality of defense software that affects assurance is that it is developed by contractors working at arm’s length from the DoD. This means, for example, that the information sharing necessary to assessing and achieving assurance must be negotiated explicitly. There are many well-publicized examples of major defense systems exhibiting operational failures of various kinds that are, evidently, consequences of inadequate assurance practices. A recent example of this type of top-level systems engineering issue was the failure of an F-22 flight management system when it was flown across the international dateline for the first time en route from Hawaii to Japan. In a CNN interview, Maj. Gen. Don Sheppard (ret.) said, “At the international date line, whoops, all systems dumped and when I say all systems, I mean all systems, their navigation, part of their communications, 3 Michael Howard and Steve Lipner, 2006, The Security Development Lifecycle, Redmond, WA: Microsoft Press. See also Box 2.3.
OCR for page 88
Critical Code: Software Producibility for Defense BOX 4.1 Assurance and Cybersecurity—A Brief Consideration Cybersecurity Although it is not a principal focus of this report, cybersecurity is an unavoidable and critical dimension of software assurance. It is rarely possible to contemplate software assurance without also giving major attention to security considerations. This is particularly challenging because security, like assurance, must be addressed at every phase of development and the software lifecycle overall.1 A system can only be assured if it is well understood. The main text elaborates the concept of a chain of evidence, which documents this understanding as traceability from intentions to outcomes, including functional requirements, quality attributes, and architectural constraints. Security adds the additional dimension of threats and attacks. For software, these can occur not only during operations, but also at every stage of the lifecycle, from development through to ongoing evolution and update during operations. The most crude categorization of threats yields three different avenues of attack: (1) external attackers—adversaries gaining access from points external to the system, typically via network connections, (2) operational insiders—adversaries gaining access to a DoD software system through inappropriate privileging, compromised physical access, or compromised personnel, and (3) engineering insiders—adversaries influencing or participating in the engineering process at some point in the supply chain for an overall system. Attacks can have different goals, typically characterized as “CIA”—breaching Confidentiality of data, damaging the Integrity of data, and disrupting Availability of a computational service. The analysis of possible threats and attacks is a key element of secure software development. This analysis is strongly analogous to hazard analysis (as discussed elsewhere in this report), and it can lead to a host of security considerations to address in the development of systems, for example, relating to identity and attribution, network situational awareness, secure mobility, policy models and usability, forensics, etc. From the standpoint of secure software development, the committee highlights two principal policy considerations, chosen because they are most likely to significantly influence both software architecture and development practice. The first of these relates to separation—minimizing and managing the coupling among components in a way that reduces the overall extent of those most sensitive components in a system that require the highest levels of assurance as well as the “attack surface” of those components with respect to the various avenues of attack noted above. The second relates to configuration integrity—the assurance that any deviations or dynamic alterations to an operational system are consistent with architectural intent. Separation The first example of a security-related chain is the separation chain. Construction of this chain of evidence entails documenting relationships among critical shared resources and the software and system components that should, or should not, have access to or otherwise influence those resources.2 This chain documents the means by which access to resources is provided—or denied—to the components of a software system that need to rely on those resources. A less trusted component, for example, may be excluded by policy from observing, changing, or influencing access by others to a critical resource such as a private key. The ability to construct chains of this kind is determined by architectural decisions and implementation practices. Concepts from security architecture such as process separation, isolation, encapsulation, and secure communication architecture determine whether this kind of chain can be feasibly constructed, with minimal exposure of the most sensitive portions of a system. For example, modern commercial PC operating systems are designed to achieve security goals while offering tremendous generality and power 1 Steve Lipner and Michael Howard, 2006, The Security Development Lifecycle, Redmond, WA: Microsoft Press. See also Gary McGraw, 2006, Software Security: Building Security In, Boston: Addison-Wesley. 2 This documentation should be formal wherever possible, such as might be derived from code analysis, verification, and modeling.
OCR for page 89
Critical Code: Software Producibility for Defense in their underlying services and resource-management capabilities. Operating systems more focused on media delivery may offer less generality and flexibility, but may do better in providing assurance relating to security because architectures are designed to more tightly regulate access to resources. Research advances can expand architectural options for which assurance of this kind can be achieved. This is influenced both through enhancement of architectural sophistication and through the ability to model and assure policies. Configuration The second example of a security-related chain is the configuration chain. This chain documents the configuration integrity that is established when a system starts up and that is sustained through operations. The chain, in this case, typically links a known hardware configuration with the full complexity of an overall running system, including software code, firmware, and hardware operating within that configuration. Loss of integrity can occur, for example, when malware arrives over a network and embeds itself within a system. It should be clear that this chain (like the other chain) is significant not only for networked systems but also for any system with a diverse supply chain, due to the differing trust levels conferred on system components. The assurance enabled by this chain is that the assumptions that underlie the construction of other kinds of chains (and the architectural, functional, and other decisions that enable that construction) are reflected in the reality of the code that executes—and so the conclusions can be trusted. Put simply, this chain assures an absence of tampering. This has proven to be a singular challenge for commercial operating systems, as evidenced by the difficulty of detecting and eradicating rootkits, for example. Documentation of this second kind of chain is complicated by a diversity of factors. One is the dynamism of modern architectures, which afford the flexibility and convenience of dynamically loading software components such as device drivers and libraries. Another is the layered and modular structure that is the usual result of considerations related to development of the second kind of chain. A third factor is assuring configuration integrity of the hardware itself. Including hardware in the chain can be much more challenging than the analogous process for software, because of the added need to “reverse engineer” physical hardware.3 A fourth factor is derived from the “bootstrap” process through which initial software configurations are loaded onto bare hardware, generally layer by layer. This affords the opportunity of an iterative and ongoing process of loading and integrity checking, such as has been envisioned in the development of the TPM chips that are present on the motherboards of most PCs and game platforms.4 In this model, the intent is to assure integrity through fingerprinting and monitoring the integrity of software components as they are loaded and configured both through the bootstrap process and during operations. These four factors, combined with a highly competitive environment that discourages compromise on systems functionality and performance, have proven highly challenging for DoD in adopting commercial off-the-shelf operating systems, for example.5 A Note on Secrecy Security-related faults lead to hazards just when attackers are able to exploit those faults to create errors and failures. It may be tempting, therefore, to think that full secrecy of the software code base would preclude such possibilities. For defense systems there are many good reasons for secrecy, but, from the perspective of exploitation of vulnerabilities, over-reliance on secrecy (“security through obscurity”) is a 3 DSB, February 2005, Report of the Defense Science Board Task Force on High Performance Microchip Supply, Washington, DC: Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics, Available online at http://stinet.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA473661. Last accessed August 20, 2010. 4 See http://www.trustedcomputinggroup.org/. 5 DSB, September 2007, Report of the Defense Science Board Task Force on Mission Impact of Foreign Influence on DoD Software, Washington, DC: Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics. Available online at http://stinet.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA473661. Last accessed August 20, 2010.
OCR for page 90
OCR for page 91
Critical Code: Software Producibility for Defense their fuel systems. They were—they could have been in real trouble. They were with their tankers…. The [F-22 crews] tried to reset their systems, couldn’t get them reset. The tankers brought them back to Hawaii. This could have been real serious. It certainly could have been real serious if the weather had been bad. It turned out OK. It was fixed in 48 hours. It was a computer glitch in the millions of lines of code, somebody made an error in a couple lines of the code and everything goes.” The contact with the tankers was visual: “Had they gotten separated from their tankers or had the weather been bad, they had no attitude reference. They had no communications or navigation. They would have turned around and probably could have found the Hawaiian Islands. But if the weather had been bad on approach, there could have been real trouble.”4 There Are Diverse Quality Attributes and Methods Software assurance encompasses a wide range of quality attributes. For defense systems, there is particular emphasis on addressing hazards related to security (primarily confidentiality, integrity, and access of service, see Box 4.1), availability and responsiveness (up time and speed of response), safety (life and property), adherence to policy (rules of engagement), and diverse other attributes. There is a very broad range of kinds of failures, errors, and faults that can lead to such hazards (Box 4.2). Software assurance practices must therefore encompass a correspondingly broad range of techniques and practices. There is a false perception that assurance can be achieved entirely through acceptance evaluation such as that achieved through operational and systems test. Systems test is certainly a necessary step in assuring functional properties and many performance properties. But it is by no means sufficient. Assurance cannot be readily achieved from testing for many kinds of failures related to security, intermittent failures due to non-determinism and concurrency, readiness for likely future evolution and interoperation requirements, readiness for infrastructure upgrades, highly complex state space, and other kinds of failures. A comprehensive assurance practice requires attention to quality issues throughout the development and operations lifecycle, at virtually every stage of the process and at all links in the supply chain supporting the overall system. The latter point is a consequence of the observation above regarding the fallacy of relying entirely on acceptance evaluation and operational testing. Although the DoD relies extensively on vendor software and undertakes considerable testing of that software, it also implicitly relies on a relationship founded in trust (rather than “verify”) to assure many of the quality attributes (listed above) that are not effectively supported through this kind of testing. This issue is explored at length in a report by the Defense Science Board on foreign software in defense systems.5 It is now increasingly well understood by software engineers and managers that quality, including security, is not “tested in,” but rather must be “built in.”6 But there are great challenges to succeeding in both “building in quality,” using preventive methods, and assuring that it is there, using evaluative methods. The nature of the challenge is determined by a combination of factors, including the potential operational hazards, the system requirements, infrastructure choices, and many other factors. 4 “F-22 Squadron Shot Down by the International Date Line,” Defense Industry Daily, March 1, 2007. Available online at http://www.defenseindustrydaily.com/f22-squadron-shot-down-by-the-international-date-line-03087/. August 10, 2010. There are also numerous public accounts of software failures of diverse kinds and consequences, such as those cited in the Forum on Risks to the Public in Computers and Related Systems, available online at http://www.risks.org. 5 Defense Science Board (DSB), 2007, Report of the Defense Science Board Task Force on Mission Impact of Foreign Influence on DoD Software, Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics. Available online at http://stinet.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA473661. Last accessed August 20, 2010. 6 This is not a comment about test-driven development, which is an excellent way to transform the valuable evaluative practice of testing into a more valuable preventive practice of test-driven development—building test cases and code simultaneously or even writing test cases before code is written. Note here that “test” should be broadly construed, encompassing quality techniques such as inspection, modeling, and analysis. There are benefits to writing code from the outset that more readily support, for example, modeling, sound analysis, and structured inspection.
OCR for page 92
Critical Code: Software Producibility for Defense BOX 4.2 Faults, Errors, Failures, and Hazards A failure is a manifestation of a system that is inconsistent with its functional or quality intent—it fails to perform to specification. A hazard is a consequence to an organization or its mission that is the result of a system manifesting a failure. That is, if a system has been placed in a critical role and a failure occurs, then the hazard is the consequence to that role. For example, if an aircraft navigation system delivers incorrect results, the hazard is the potential consequence to the aircraft, its occupants, its owner, and so on, of incorrect navigation. An error, like a failure, is a manifestation when a system is running. But an error can be contained entirely within a system, not necessarily leading to failures. For example, some database systems can detect and remediate “local deadlocks” that involve perhaps a pair of threads, and they can do this in a generally transparent manner. Another example is an unexpected exception (such as might be raised when a null pointer is de-referenced) being handled locally within a component or subsystem. More broadly, architectures can be designed to detect errors, including security problems, within individual components and can reconfigure themselves to isolate or otherwise neutralize those errors.1 Errors, in turn, are enabled by local faults in code. A fault is a static flaw in the code at a particular place or region or identifiable set of places. Examples of faults include points in code where integrity tests are not made (leading to robustness errors), where locks are not acquired (leading to potential race conditions), where data is incorrectly interpreted (leading to erroneous output values), where program logic is flawed (leading to incorrect results), and so on. In systems that include hardware, probabilistic models are used to make predictions regarding when errors or failures are likely to occur—for example, to compute mean time to failure or expected lifetimes of components. These models are the core of reliability theory, and they can involve complex relationships of conditional probability (i.e., faults that are more likely in the presence of other faults), coupled probability (e.g., when many faults are made more likely in adverse weather), and other complexities. With software, these probabilistic models are less useful, since the failures are caused by intrinsic design flaws that require implementation changes for correction. Intermittent errors in software are thus “designed into the code” (albeit unintentionally). Repair thus means making changes in the flawed design. For embedded software, where the software includes fault-tolerance roles, hybrid models are often most helpful. This model helps to highlight the challenges associated with effective software testing, inspection, and 1Of course, it is possible that the mechanism by which errors are contained results in a loss of information regarding both the errors and the fact that they were contained. This information loss can create dangerous situations. The well known case of the Therac 25 failures (Nancy G. Leveson and Clark S. Turner, 1993, “An Investigation of the Therac-25 Accidents,” IEEE Computer 26(7):18-41) is a particularly compelling example of the consequences of inadequate information regarding actual error containment in operations. In this case, engineers acted on a false supposition regarding the extent of error containment by a hardware mechanism in operations, resulting in fatal x-ray doses being administered to cancer patients. Underlying both preventive and evaluative methods are the two most critical broad influences on quality: judicious choice of process and practices, and the capability and training of the people involved in the process. Process and practices can include techniques for measurement and feedback in process execution in support of iteration, progress and earned value tracking, and engineering-risk management. Indeed, a key feature of process design is the concept of feedback loops specifically creating opportuni-
OCR for page 93
Critical Code: Software Producibility for Defense static analysis. The results of tests that fail are manifestations of errors (unit tests) or failures (system tests). Assuming the tests are valid, the engineer must then ascertain which faults may have led to the error or failure manifestations. This reverse-engineering puzzle can be challenging, or not, depending on the scope of the tests and the complexity of the code. Failures in system tests, for example, can derive from the full scope of the code, including incorporated vendor components and infrastructure. Test results are generally of moderate to high value, because they reflect the priorities implicit in the test coverage strategy that guided their creation.2 One of the pitfalls of late testing, as would be the case if unit testing were deferred, is that the faults identified may have become very expensive to repair, adding substantially to engineering risk. If the fault is fundamental to the design of a particular interface, then all clients and suppliers that share that interface may be affected as part of the repair process. If the fault is architectural, the costs may be greater, and there may be new engineering risks associated with exploration of alternative options. This suggests both that testing be done at the component level early in the process and that commitments related to architecture and interface design be evaluated through modeling, simulation, and analysis as early as possible in the lifecycle. The results of inspections, on the other hand, generally point to specific places in code or in models where there are problems of one kind of another. This, from a purely theoretical basis, may be why inspections are sometimes measured as being more effective than testing by a measure of defects found per hour. Because inspections usually combine explicit targeting of issues and opportunistic exploration, the issues found are generally high value. Static analysis results, including both sound analyses and heuristic analyses, generally point to faults in the code. They thus share with inspections the productivity advantage of avoiding the puzzle-solving inherent in the handling of adverse test results. Additionally, static analysis results can highlight low-probability intermittent errors that might routinely crash continuously operating servers but not be readily detectable using conventional testing. Unlike validated tests, analysis results can include false positives, which are indications of possible faults when there are actually no faults. (Unvalidated tests can also produce false positives in cases where the code is “correct,” but the test case is not.) Sound static analysis (i.e., static analysis with no false negatives) is used in compiler type checkers and some free-standing analysis tools. Its results are usually tightly targeted to very particular attributes and can lead fairly directly to repairs. Heuristic static analysis results, such as from open-source tools PMD and FindBugs, have considerably broader coverage than targeted sound analysis. But the results are typically less exact, and include false negatives (faults not found) as well as false positives. Additionally, there can be large numbers of results ranging from serious issues to code layout style suggestions. This necessitates an explicit process to set priorities among the results. An analysis of the open-source Hadoop system, for example, can yield more than 10,000 findings. 2 Test coverage metrics can be useful, but there are many kinds of coverage criteria. Pure “statement coverage” may be misleading, because it may indicate a prevalence of regression tests crafted in response to defects rather than of tests motivated by more “proactive” criteria. ties for feedback at low cost and with high benefit in terms of reducing engineering risk.7 Practices can also include approaches to defect tracking, root cause analysis, and so on. There is overlap between preventive and evaluative methods because evaluative methods are most effective when applied throughout a development process and not just as part of a systems-level acceptance evaluation activity. When used at the earliest stages in the process, evaluative methods shorten 7 These feedback loops may be conceptualized as “OODA loops”—Observe, Orient, Decide, Act. The OODA model for operational processes was articulated by COL John Boyd, USAF, and is widely used as a conceptual framework for iterative planning and replanning processes.
OCR for page 94
Critical Code: Software Producibility for Defense BOX 4.3 Examples of Preventive and Evaluative Methods Below are several illustrative examples of preventive methods. Underlying all of these particular methods is an emphasis on preventing the introduction of defects or finding them as soon as possible after they are introduced. Requirements analysis. Assess operational hazards derived from context of use, adjusting operational plans to the extent possible to minimize potential hazard. Assess goals and limits with respect to quality attributes. Architecture design. Adopt structural approaches that enhance reliability, robustness, and security while also providing flexibility in areas of anticipated change. Ecosystem choice. Affiliate with ecosystems based on quality assessments of components and infrastructure derived from the associated supply chain. Detail design. Adopt software structures and patterns that enhance localization of data and control over access. Specification and documentation. Capture explicit formal and informal representations of functional and quality-attribute requirements, architecture description, detail design commitments, rationale, etc. Modeling and simulation. Many software projects fail because the consequences of early decisions are not understood until late in the process, when the costs of revising those decisions appear to be prohibitively high, leading to costly workarounds and acceptance of additional engineering risk. It may be perceived by project managers that evaluation cannot be done before code is written and can be run. In fact, a range of techniques related to modeling and simulation can be employed to achieve “early validation” of critical up-front decisions. These techniques include prototyping, architectural simulation, model checking of specifications, and other kinds of analysis.1 Coding. Adopt secure coding practices and more transparent structured coding styles that facilitate the various evaluative methods. Programming language. Select languages that provide first-class encapsulation and controlled storage management. Tooling. Support traceability and logging structures in tooling, providing direct (and ideally semantics-based) interlinking among related design artifacts such as architecture and design specifications, source code, functional specifications, quality-attribute specifications, test cases, etc. 1 Daniel Jackson’s Alloy model checker is an example of early validation technique for specifications. Daniel Jackson and Martin Rinard, 2000, “Software Analysis: A Roadmap,” in The Future of Software Engineering, Anthony Finkelstein, ed., New York: ACM, pp. 215-224. feedback loops and guide development choices, thus lessening engineering risk. (To illustrate the range of methods and interventions related to quality in software, a summary is presented in Box 4.3.) Judgments Are Based on Chains of Evidence The goal of assurance methods is to create connections, a set of “chains of evidence” that ultimately connect the code that executes with architectural, functional, and quality requirements. The creation of these chains is necessarily an incremental process, with “links” in the chains being created and adapted as the development process proceeds. An example of a link is a test case that connects code with a particular expectation regarding behavior at an internal software interface. Another link, perhaps cre-
OCR for page 95
Critical Code: Software Producibility for Defense Here are several illustrative examples of evaluative methods. These are applied throughout a lifecycle to assess various kinds of software artifacts. Inspection of the full range of software-related artifacts, ranging from models and simulation results supporting requirements and architecture design to detailed design specifications, code, and test cases. Testing of code with respect to function, performance, usability, integration, and other characteristics. Test cases can be developed to operate at the system level, for example, simulating web-browser clients in testing e-commerce or other web services systems, or they can operate on code “units” across software interfaces to test aspects of component behavior. Test cases are selected according to a combination of coverage strategies determined by architecture and ecosystem, software design, programming language choice, potential operational risks, secure coding practices, and other considerations. Direct analysis of source, intermediate, or binary code, using sound tools that target particular quality attributes and heuristic tools that address a broader range of quality attributes. Monitoring of operational code and dynamic analysis of running code, focused on particular quality attributes. As with testing, monitoring can operate at the system level, including logging and event capture, as well as at the unit level, such as for transaction and other internally focused event logs. Monitoring supports prevention, evaluation, and also forensics after failures occur. Infrastructure for monitoring can support a range from real-time to short-time delayed to forensic analyses of the collected event data. In the absence of other feedback loops, this can assist in focusing attention on making repairs and doing rework. Verification of code against specifications. A number of formal “positive verification” capabilities have become practical in recent years for two reasons: First, scalability and usability are more readily achievable when verification is targeted to particular quality attributes.2 Second, new techniques are emerging, based on model checking or sound analysis that support this more targeted verification without excessive requirements for writing formal specifications and assertions in code. Various process models have been proposed that provide a framework within which these various preventive and evaluative methods can be applied in a systematic fashion, structured, as it were, within Observe-Orient-Decide-Act (OODA) loops of various durations. Two of the most prominent are the Lipner-Howard method (the SDC, or Secure Development Lifecycle) and the method proposed by McGraw. 2 An example is the Microsoft Static Driver Verifier tool developed by Tom Ball for verifying protocol compliance of Windows device driver code using model checking. See Steve Lipner and Michael Howard, 2006, The Security Development Lifecycle: A Process for Developing Demonstrably More Secure Software, Redmond, WA: Microsoft Press. ated using model-based analysis techniques, would connect this specific interface expectation with a more global architectural property. Another link is the connection of a fragmentary program annotation (“not null”) with the code it decorates. A further link would connect that global architectural property with a required system-level quality attribute. Validation of this small chain of links could come from system-level testing or monitoring that provides evidence to support presence of the system-level quality attribute. This metaphor is useful in highlighting several significant features that influence assurance practice and the cost and potential to achieve high levels of assurance. Here are some examples of influences on the success of assurance practice:
OCR for page 96
Critical Code: Software Producibility for Defense There is a great diversity of the particular kinds of attributes that are to be assured. These range from functional behavior, performance, and availability, to security, usability, and interface compliance for service application programming interface APIs and frameworks. The Mitre Corporation maintains a catalog, the Common Weakness Enumeration (CWE)8 that illustrates the diversity in its identification of more than 800 specific kinds of “software weaknesses.” There is also a great diversity of kinds of artifacts that must be linked in the chains. These include code, design models, architectural models, and specifications of functional and quality requirements. These also include more focused artifacts such as individual test cases, inspection results, analysis results, annotations and comments in code, and performance test results. There is a range of formality among these artifacts—some have precise structure and meaning, and others are informal descriptions in natural language or presented as diagrams. (This issue is elaborated below.) Components and services encompassed in a system may have diverse sources, with varying degrees of access to the artifacts and support/cooperation in an overall assurance process. Identification and evaluation of sources in an overall supply chain is a significant issue for cybersecurity (see Box 4.1), for which both provenance (trust) and direct evidence (verification) are considerations that influence the cost and effectiveness of an assurance process. Many different kinds of techniques must be employed to assess consistency among artifacts and to build links in the chain. The most widely used are testing and inspection. Other techniques that are increasing in importance include modeling and simulation (e.g., for potential architecture choices), static analysis, formal verification and model checking (for code, designs, specifications, and models), and dynamic analysis and monitoring (for code, design, and models). Some techniques are based not on reasoning about an artifact or component, but on safely containing it to insulate system data and control flow from adverse actions of the component. Techniques include sandboxing, process separation, virtual machines, etc.9 Different links in the chain may have different levels of “confidence,” with some providing (contingent) verification results and others providing a more probabilistic outcomes that may (or may not) increase confidence in consistency among artifacts. Test coverage analysis, for example, can be used to assess the appropriate degree to which a particular set of test results may be generalized to give confidence with respect to some broad assurance criterion. Methods or their implementations may be flawed or implemented in a heuristic way that may lead to false positives and/or false negatives in the process of building chains. Perhaps most importantly, the cost-effectiveness of activities related to software assurance is heavily influenced by particular choices made in development practice—factors that are in the control of developers, managers, and program managers. Here are examples of factors that influence the effectiveness and cost of both preventive and evaluative methods: In assurance activities, access is provided not only to source code, but also to specifications, models, and other documentation. Without this information, evaluators must expend resources to “reverse engineer” design intent on code produced within their own organization and create these intermediate models. In the 1980s, a study suggested that, in fact, the DoD spends almost half of its 8 The CWE inventory (available online at http://cwe.mitre.org/) focuses primarily on security-related attributes. See also, for example, Robert C. Seacord, 2005, Secure Coding in C and C++, Boston: Addison-Wesley, for an inventory of potential issues related to not only secure, but also safe and high-quality code. There is substantial overlap of attributes related to safe and quality coding, on the one hand, and security, on the other. 9 Use of these containment or isolation techniques may create benefits for components that are opaque (some vendor executables, for example) or that are difficult to assure intrinsically (mobile code and scripts in a web services environment, for example). But there are also potential hazards associated with the containment infrastructure itself (such as virtual machine or a web-client sandbox), which must often also be assured to a high level of confidence.
OCR for page 101
Critical Code: Software Producibility for Defense Modeling From the standpoint of assurance, models of all kinds—architecture, design, performance, structural, and semantic—form the intermediate way-points that facilitate linking (in the chain of evidence) of executable code with requirements of various kinds. The way-points include “domain-oriented” models related to requirements.21 The UML family of design models includes models that are more formal, such as StateCharts, and others that are less formal, such as deployment diagrams. The advantage of the more formal models is that there is more that tools can do to support traceability and analysis. StateCharts has a precise semantics rooted in state machines, which enables creation of a range of tools for analysis, simulation, consistency checking with code, and the like. There are benefits, of course, when models can not only support the software development process and management of engineering risks (e.g., through simulation and analysis), but also facilitate the activities related to assurance. Many of the topics identified in Chapter 6 relate to modeling and the use of models for various purposes. Tools such as model checkers and static analysis tools are informed by formal specification fragments, which are a kind of model. These are sometimes expressed in self-contained specifications (e.g., linear temporal logic specifications or Alloy specifications for model checkers) and sometimes use fragmentary annotations associated with code or models. Some verification tools make use of highly expressive specification languages for functional properties. In general there is an advancing frontier from informal to formal models—actually from less formal to more formal models—and modern tooling is creating momentum to push this frontier more rapidly and effectively. In Chapter 5, there is discussion regarding research goals related to both advancing modeling and specification capability and also to improving techniques and tools for reasoning and analysis. Examples include techniques ranging from theorem proving, model checking, and analysis to type modeling and checking, architectural and design analysis, and analyses related to concurrency and parallelism. Much of the recent progress in program analysis, which is particularly evident in certain leading vendor development practices, is built on these ideas. Consistency Information in a software development process is gathered incrementally over time. Almost always, systems are evolving and so are detailed choices regarding architecture, requirements, and design. A seemingly unavoidable consequence is a loss of consistency within the database of information captured over time. Indeed, developers often set aside documents and model descriptions, and resort to interviewing colleagues and doing reverse engineering of code in order to develop confidence in the models they are building or evolving. Precision in models (formality) can be useful in achieving consistency when tools can be used to analyze consistency on an ongoing basis. Tool use ranges from maintenance of batteries of regression tests to the use of verification and analysis tools to compare code with models. With both formal and informal information, explicit hyperlinking can expose interrelationships to developers and enable them to more readily sustain consistency among design artifacts. Extensive hyperlinking is a feature of modern development tools, including team tools and developer tools. It is an essential feature, for example, of modern open-source development and build environments.22 With automated tools, a very fine granularity can be achieved without adding to developer effort. For example, an open-source developer can check in code by submitting a simple “patch” file, 21 Requirements always start with informal articulations that are made precise and potentially formal (in the sense of this chapter) through the development process. One of the great benefits of high-quality models for requirements and associated domain concepts is the opportunity for early validation. These models can include scenarios, use cases, mock-ups, etc. 22 Linking and other kinds of support for traceability are supported in most commercial development tools and in high-end open-source ecosystems. An example that can be readily explored is the Mozilla development ecosystem—see, for example, code and tools at https://hg.mozilla.org/mozilla-central.
OCR for page 102
Critical Code: Software Producibility for Defense and from this the tools can update the information database in a way that shows the identity of the developer who last changed every individual line of code, along with some informal and semi-formal rationale information such as a reference to a file version number and an identifier from the issue/defect database. Usability Even the highest quality information does not add value if it is not readily accessible and applicable by the key stakeholders in the software development process—developers, managers, evaluators, and others. With respect to search, for example, there are enormous differences in efficiency between traditional paper documents and electronic records. Augmenting search with linking and with direct support for anticipated workflows is another large step in efficiency. Choice of representation for expressing design information and models can also make a significant difference—“developer-accessible” notations can reduce training requirements and lower barriers to entry for developers to capture information that otherwise might not be expressed at all. Indeed, we can contemplate a concept of “developer economics” that can be used as a guide for assessing potential motivation of individual developers in using assurance-related tools. An example of bad developer economics is when a developer or team is asked to devote considerable time and effort to expressing design information when payback is uncertain, diffuse, or most likely far in the future. A goal in formulating incentive models that motivate developer effort (beyond management or contractual mandates) is to afford developers increments of value for increments of time invested in capturing design information, and to provide that value as soon as possible after the effort has been invested. Thus, when a developer writes a single-unit test case, it becomes possible both to execute that test case right away on an existing small unit, and to validate the test case against other design information (and to capture links with that design information to support consistency). This “early gratification incrementality” can be a challenge to achieve for certain kinds of tools and formal documentation, however. Success in achieving this “early gratification” is one of the reasons why unit testing has caught on, and model checking and analysis are also emerging into practice.23 Finding 4-2: Assurance is facilitated by advances in diverse aspects of software engineering practice and technology, including modeling, analysis, tools and environments, traceability, programming languages, and process support. Advances focused on simultaneous creation of assurance-related evidence with ongoing development effort have high potential to improve the overall assurance of systems. CHALLENGES FOR DEFENSE AND SIMILAR COMPLEX SYSTEMS Hazards The extent and rigor adopted for an evaluation process is most directly influenced by the potential hazards associated with the intended operational environment. Missile launch control, cryptographic tools, infusion pumps for medication administration, automobile brake systems, and fly-by-wire avionics are all “critical systems” whose design and construction are profoundly influenced by considerations of evaluation and assurance. For many critical systems, standards have been established that regulate various aspects of process, supply-chain decisions, developer training and certification, and evaluation. These standards are ultimately directed toward assurances regarding quality attributes in running code. From the particular perspective of assurance, any focus on aspects other than the intended delivered 23 Difficulty in achieving this kind of incrementality has been a challenge to the adoption of emerging prototype functional verification systems.
OCR for page 103
Critical Code: Software Producibility for Defense code (and its associated chains of evidence) is intended either as a predictor of ultimate code quality or, often, as a surrogate for direct evaluation of some critical quality of that running code. The latter approach is often used as a “work-around” when direct evaluation is thwarted by the raw complexity of the system or the inadequacy of methods and tools available for direct evaluation. Indeed, system managers often feel that they face an uncomfortable tradeoff between enhancing the capability of a system and delivering a high level of assurance. This folkloric “quality-capability tradeoff” is particularly challenging because it may be difficult to know exactly where on the quality axis a particular design is likely to reside. Greater incentives for quality have had the effect of “pushing outward” this tradeoff curve for both preventive and evaluative methods. This observation explains, for example, why vendors such as Microsoft have made such a strong commitment to advancing in all areas of prevention and evaluation, because it enables them to offer simultaneous increases in quality and capability. Capability and Complexity A major complicating factor in software assurance for defense is the rapid growth in the scale, complexity, and criticality of software in systems of all kinds. (This is elaborated in Chapter 1.) This growth adds to both factors in the risk product, including extent of consequence (hazard, due to the growing criticality of software systems, and cost of repair, due to the growing significance of early commitments) and potential for consequence (due to complexity and interlinking with other systems). The transition to fly-by-wire aircraft, which was for many years loudly debated, is an example of the growing consequence of software. In the commercial world, we are now analogously moving to “drive-by-wire” vehicles, where the connections between brake and accelerator pedals and the respective mechanical actuators are increasingly computer mediated. The benefits are significant, in the form of anti-lock braking, cruise control, fuel economy, gas/electric hybrid designs, and other factors. But so are the risks, as documented in recent cases regarding software upgrades for the brake mechanisms for certain Toyota and Ford vehicles. An example of the risks of fly-by-wire were demonstrated when an F-22 pilot had to eject from his aircraft (which eventually crashed) when he realized that, due to an unexpected gyro shutdown, he had no ability to control the aircraft from the cockpit. He realized this only after takeoff, when the aircraft initiated a series of uncommanded maneuvers. In modern fighters, if the Vehicle Management System computers (VMS) are lost, so is the aircraft. As noted in National Research Council reports, more constrained domains such as medical devices and avionics benefit from rigorous standards of quality and practice such as DO-178B.24 These standards prescribe specific documents, process choices (including iterative models), consistency management and traceability practices, and assurance arguments (“verification”) that include various links of the chain, as described earlier in this chapter. These approaches are extremely valuable, but they also appear to be more effective in domains with less diversity and scale than is experienced in DoD critical systems. Complexity and Supply Chains An additional complicating factor in software assurance for defense is the changing character of the architecture and supply structure for software systems generally, including defense software systems. The changes, which are enabled by advances in the underlying software technologies, particularly related to languages, tools, and runtime architectures, allow for more complex architectures and richer and more diverse supply chains. Even routine software for infrastructure users such as banks, for example, can involve dozens of major modules from a similar number of vendor and developer 24 NRC, Daniel Jackson, Martyn Thomas, and Lynette I. Millett, eds., 2007, Software for Dependable Systems, Washington, DC: National Academies Press. Available online at http://www.nap.edu/catalog.php?record_id=11923. Accessed August 20, 2010.
OCR for page 104
Critical Code: Software Producibility for Defense organizations, as well as custom software components developed by multiple independent in-house development teams. This is in addition to the defense and government challenges of the customer and key stakeholders working at arm’s length from the development teams. When systems are modular and component-based, there are sometimes opportunities to structure the assurance task in an analogously modular fashion. Unfortunately, many critical software attributes do not “compose” in this fashion, but there are some that do. For example, type correctness of software in modern languages such as Java, C#, and Ada is composable, which permits separate compilation of distinct modules. But without composability, the problem of creating “links” in the assurance chain can rapidly become intractable. Composability is therefore an important goal in the design of models, languages, and analysis capabilities. Additionally, modern systems make greater use of complex software frameworks and libraries. This is a great success in reuse, but there is also great complexity. Frameworks provide aggregate functionalities such as graphical user interaction, application server capability, web services support, mobile device capabilities, software development environments, enterprise resource planning (ERP), and the like. These frameworks embody many of the technical commitments associated with the ecosystems described in Chapter 1, and they now appear ubiquitously in larger-scale commercial applications. A framework is different from a library, roughly, because it embodies greater architectural commitment, including the structure of its associated subsystems, patterns for the flow of control, and representations for key data structures. This approach, which is enabled by modern object-oriented technology and languages, greatly reduces engineering risk for framework users, because the established frameworks embody proven architectures. But it does create some assurance challenges due to the complexity of the relationships among the framework, its client code, and potentially framework add-ins that augment capability in various ways. Frameworks and Components The success of component-based architectures, libraries, and frameworks has led to larger and more capable software applications that draw from a much greater diversity of sources for code. This is a mixed blessing. On the one hand, highly capable and innovative applications can be created largely by selecting ecosystems and assembling components, with a relatively very small proportion of new custom design and code development. Often the overall architecture can be highly innovative, even when it incorporates subsystems and components drawn from established ecosystems. This approach is particularly well suited to incremental methods that facilitate accommodation of the refresh cycles for the various constituent components. It also facilitates prototyping, because functional capabilities can often be approximated through the assembly process, with additional custom code added in later iterations to tailor to more detailed functional needs, as they become better understood. Trust This model, while attractive in many respects, poses significant challenges for assurance. Because there are diverse components from diverse sources, there will necessarily be differences in the levels of trust conferred on both components and suppliers. This means that, in the parlance of cybersecurity, there are potential attack surfaces inside as well as outside the software application and that we must support rigorous defense at the interfaces within the application. In other words, the new perimeter is within the application rather than around it or its platform. This can imply, for example, that the kinds of architecture analyses alluded to in Chapter 3 that relate to modularity and coupling may also be useful in assuring that among components in a system (e.g., involving access to data or control of resources) there is no “connectivity” other than that which is intended by the architects. This new reality for large systems poses great challenges for assurance, because of the potentially reduced ability to influence the many sources in the supply chain and also because of the technical
OCR for page 105
Critical Code: Software Producibility for Defense challenges of composing assessment results for individual components and subsystems into aggregate conclusions that can support an assurance case. Vendor components are very often accepted on the basis of trust and expectations rather than direct analysis. There are both technical and legal barriers to direct analysis that often thwart the ability of the DoD to make sound assessments that can lead to reliable conclusions regarding assurance. There are several options in these cases. One is to employ a formal third-party assessment process such as Common Criteria (ISO 15408), which is in fact derived from the old “Orange Book” process defined in the early 1980s. These processes can be expensive and can create delay.25 Additionally, results can be invalidated when components must be configured, plug-ins are added, or other small changes are made such as adding device drivers to an operating system configuration. There has been much consideration of alternate approaches to such assessments. (Detailed consideration of this issue is beyond the scope of this report, but consideration is given in the referenced DSB report.26) TWO SCENARIOS FOR SOFTWARE ASSURANCE To illustrate evaluative techniques and the value of preventive techniques when software is developed at arm’s length, the committee presents two speculative scenarios for software assurance. In the first scenario, evaluators are given full access to an already existing software system that is proposed for operational release. The access includes source code for all custom development as well as all associated development documents. The evaluators also have access to threat experts, and they may have the opportunity to interview members of the development team. In the second scenario, a similar system is developed, but evaluators have access to the development team from the outset of the project, and the development team leaders have specific contractual incentives to obtain favorable judgments of high assurance. The first scenario, which is fully after the fact, may be read as a strawman for the second and more desirable scenario. Unfortunately, an after-the-fact response such as sketched in the first scenario is all too often called for in practice—and indeed in some cases may be optimistic due to the opacity of many code and service components. First Scenario—After the Fact In the informal narrative below, the committee starts with the first scenario and then (under the same paragraph headings) explores the potential benefits of the greater access in the second scenario. Hazard and requirements analysis. The first step for the evaluators is to engage with the threat experts and the operational stakeholders for the purpose of identifying the key hazards. These could include hazards related to quality attributes: security hazards (e.g., confidentiality, integrity, and access in some combination), safety hazards (e.g., related to weapons release), and reliability and performance hazards. This will include identification of the principal hazards relating to functional attributes—correctness of operation, usability and ergonomic considerations, and compliance with interoperation requirements 25 The Common Criteria standard (ISO 15408) is generally considered to be more successful for well-scoped categories of products such as firewalls and other self-contained devices—as contrasted with general-purpose operating systems, for example. Success with Common Criteria is also challenged by dynamic reconfiguration, such as through dynamically loaded libraries, device driver additions, and reconfiguration of system settings by users and administrators. Additionally, much of the evaluation undertaken through the Common Criteria process is focused on design documents rather than on the code to be executed. There may be no full traceability of executing code corresponding to the evaluated design documents. 26 DSB, 2007, Report of the Defense Science Board Task Force on Mission Impact of Foreign Influence on DoD Software, Washington, DC: Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics. Available online at http://stinet.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA473661. Last accessed August 20, 2010.
OCR for page 106
Critical Code: Software Producibility for Defense and, more generally, with standards associated with interlinked systems (ultra-scale, net-centric, system of systems, etc). Architecture and component identification. The system and its associated documents are then analyzed to determine the overall intended and as-built system architectures. The intended architecture may not correspond exactly to the as-built, but it should be as close as possible, with deviations plausibly explainable as design or coding defects. As part of this process, the internal component structure of the system is modeled, including the adoption of off-the-shelf components and frameworks from established ecosystems. For example, if the system uses web capabilities, then there will likely be major subsystems implemented as configured vendor frameworks. The result of this step is an architectural model, an identification of the principal internal interfaces that mediate interactions among components (frameworks, libraries, local services, network-accessed services, custom components, etc.), and an identification of significant semantic invariants regarding shared data, critical process flows, timing and performance constraints, and other significant architectural features.27 Component-level error and failure modeling. If successful, the architectural analysis yields an understanding of principal constraints on the components of the system that relate to attributes such as timing, resource usage, data flows and access, user interaction constraints, and potentially many other attributes depending on the kind of system. This process, and also the architecture analysis process, is informed by documents and developer interviews. Supply-chain and development history appraisal. Based on information regarding component sourcing and supply-chain management practices, levels of trust are assigned to system components. This will inform priority setting in assessment of the individual components. Custom components from less-trusted sources may merit greater attention, for example, than off-the-shelf commercial components from more trusted sources. A similar analysis should apply to services (e.g., cloud services, software-as-a-service capabilities, etc.). Open-source components afford visibility into code, rationale, and history. They may also afford access to test cases, performance analyses, and other pertinent artifacts. It is also helpful, from the standpoint of security threats (see Box 4.1), to assess detailed historical development data. This can include not only data regarding producer/consumer interfaces within the supply chain, but also, when possible, code check-in records from modern development databases (such as captured in open-source systems such as SVN and CVS and similar commercial products and services). Analysis of architecture and component models. Proceeding on the (as yet unverified) assumption that component implementations are consistent with their constraints, the models at the granularity of architecture and component interactions can be subject to analysis. Because of the diversity of attributes of the models that can trace to the identified failures and hazards, multiple modeling exercises are likely to be undertaken, each focusing on particular attributes. When the models can be rendered formally, then tools for semi-automated analysis can be used for model checking, theorem proving, static analysis (at model level), simulation, and other kinds of mathematically based analysis. If certain models can be formalized only partially or not at all, then a more manual approach must be adopted to undertake the analysis. Identify high-interest components. Component analyses can be prioritized on the basis of a combination of trust level (from the supply-chain analysis) and potential role with respect to hazards, or “architectural criticality.” Greater attention, for example, would be devoted to a component that handles sensitive information and that is custom developed by an unknown or less trusted supplier. Develop a component evaluation plan. The evaluation plan involves allocating resources, setting priorities, identifying assurance requirements, and establishing progress measures on the basis of the analyses above. Assess individual components. This can involve a combination of evaluative techniques. “Static” 27 This documentation, focused on succinct renderings of traceability and technical attributes, should not be confused with the “for the record” documentation often required with development contracts—which may be of limited value in an assurance exercise that relies on efficient tool-assisted evaluation.
OCR for page 107
Critical Code: Software Producibility for Defense techniques, which do not involve executing the code, include inspection (with design documents), sound static analysis, and heuristic static analysis. These analyses may involve the construction of various kinds of abstract models that can themselves be analyzed to assess various functional and quality attributes. This activity is facilitated when models can be made more formal—informally expressed models necessarily require people to make interpretations and assessments. The analyses may also involve “dynamic” techniques, which involve execution of the code, either in situ in the running system (analogous to in vivo testing in life sciences) or in test scaffolds (analogous to in vitro testing in life sciences). If the project had used unit testing, then scaffold code would be included in the corpus, and this could be adapted and reused. Dynamic methods also include dynamic analysis and monitoring and can be used to inform the development of static models to provide assurance in cases where this is significant—particularly concurrent and performance-sensitive code. The results of this assessment are in the form of an identification of areas of confidence and areas of remaining assessment risk with respect to the component interface specifications derived from the architecture analysis. Select courses of action for custom components. On the basis of the identification of high-interest components and the component assessment results, specific options are identified for mitigation of the remaining assessment risks. These options could range from acceptance of the component (a positive assurance judgment) to wholesale replacement of the component. Intermediate options include, for example, containment (“sandboxing” the component behind a façade that monitors and regulates control and data flows, either within the process or in a separate process or virtual machine), refactoring, and other kinds of rework that might lead to more definitive assessment results. For example, simplification of code control structure and localization of state (data) can greatly facilitate analyses of all kinds. On the other hand, if there are major issues that afflict multiple components and the value is deemed sufficient, then this kind of refactoring and rework could be done at the architectural level, facilitating assessment for multiple components. Select courses of action for opaque components and services. For opaque components (typically products from vendors), the options are more constrained. In these cases, the extent of the intervention may be influenced by the extent of trust vested in the particular vendor in its supply-chain role. When trust is relatively low, potential interventions include sandboxing (as noted above) and architectural intervention to assure that the untrusted component does not have access to the most sensitive data and control flows. Outsourced services, for example, can also be sandboxed and monitored. An alternative is to replace the component or to rework the arm’s-length contractual arrangements to facilitate access and evaluation. Refine system-level assessment. On the basis of the results of the component assessments and interventions (where appropriate and practical), architecture-level refactoring can sometimes be considered as a means to improve modularity, isolating components for which high levels of assurance cannot be achieved. Most importantly, the architectural-level models should be reconsidered in the light of the information acquired and verified in the foregoing steps. This reconsideration should focus on the hazards, quality attributes, and functional requirements as identified in the initial steps. If the component- and architecture-level assurances do not combine to yield sufficient assurances for the hazards identified, then more drastic options need to be contemplated, including canceling the project, redefining the mission context to reduce the unaddressed hazards, revising initial thresholds regarding system risks, or undertaking a more intensive reengineering process on the offending components of the system and/or its overall architecture. As noted in Chapter 3, reworking architecture commitments at this late stage can be very costly, because there can be considerable consequent rework in many components. This scenario is intended to illustrate not only the potential challenges in an evaluation process, but also some of the added costs and risks that exist due to insufficiency either of effort in the “preventive” category or of evaluator involvement in the development phase. In the second scenario, the committee briefly considers how these steps might be different were the evaluators and developers to work in partnership during the development process rather than after the fact.
OCR for page 108
Critical Code: Software Producibility for Defense Second Scenario—Preventive Practices The steps are the same as those for the first scenario, but the descriptions focus on the essential differences with the after-the-fact scenario above. This scenario should make evident the value of incentives in the development process for “design for assurability.” Hazard and requirements analysis. This step is similar, but performed as part of the overall scoping of the system. Because architecture is such a primary driver of quality attributes and assurance (as illustrated above), in this preventive scenario, a savvy manager would couple the architecture definition with the hazard analysis and, if possible, limit early commitment regarding specific functional characteristics to broad definitions of the “scope” of the system (see Chapter 2). At this stage, the first set of overall progress metrics is defined, and these could include credit to be allocated for resolving engineering risks associated with assurance. These metrics can also relate to compliance with standards associated with interlinked systems, as noted in the first scenario. Architecture and component identification. As noted earlier, the architecture definition is coupled with hazard identification and scope definition. The exceedingly high engineering risk for assurance and architecture in the after-the-fact scenario (assuming innovative architectural elements are required) is replaced with an up-front process of architecture modeling, supported by various early-validation techniques such as simulation, prototyping, and direct analysis (such as with model checking). Certain detail-level architectural commitments can be made incrementally. Progress metrics related to assurance-related engineering risk are refined and elaborated. Component-level error and failure modeling. A key difference is that the component-level modeling, combined with the supply-chain appraisal, provides an early feedback mechanism regarding engineering risks in the evolving architecture design. Risks can be assessed related not only to quality attributes and technical feasibility, but also to sourcing costs and risks. For example, choices might be made regarding opaque commercial components from a trusted source, custom components, wrapped untrusted components, and open-source components that afford stakeholders both visibility and the possibility of useful intervention (e.g., adding test cases, adapting APIs, adding features, etc.). This process can also lead to the early creation of unit test cases, analysis and instrumentation strategies, and other quality-related interventions in the component engineering process. Process metrics defined in earlier stages can inform allocation of resources in this stage of the process. The metrics are also refined as part of the incremental development process. Supply-chain and development history appraisal. See above. The committee notes that it is sometimes asserted that offshore development is intrinsically too dangerous. However, one could argue that badly managed onshore development by cleared individuals may be more dangerous than offshore development with best practices and evidence creation along with coding. A well-managed offshore approach may be feasible for many kinds of components when elements of the evolving best practice are adopted, such as (1) highly modular architectures enabling simplicity in interface specifications and concurrent development, (2) unit testing, regression testing, and code analysis, with results (and tests) delivered as evidence along with the code, (3) frequent builds, (4) best-practice configuration control, and (5) agile-style gating and process management.28 Metrics can relate to a combination of adoption of best practices and production of separately verifiable evidence to support any assurance claims. As noted above, full line-by-line historical tracking of changes to a code base is now commonplace for development projects of all sizes. A key benefit of such tracking is that it provides full traceability not only among artifacts, but also to individual developers, which is useful for security and to assure that individual developers are fully up-to-date with best practices. 28 Michael A. Cusumano, Alan MacCormack, Chris F. Kemerer, and William Crandall, 2009, Critical Decisions in Software Development: Updating the State of the Practice, IEEE Software 26(5):84-87. See also Alan MacCormack, Chris F. Kemerer, Michael Cusumano, and Bill Crandall, 2003, “Trade-offs Between Productivity and Quality in Selecting Software Development Practices,” IEEE Software 20(5):78-85.
OCR for page 109
Critical Code: Software Producibility for Defense Analysis of architecture and component models. This becomes part of the iterative early-stage process of refining architecture, quality attribute goals, functional scoping, and sourcing. If there are portions of the configuration that may create downstream challenges for evaluators, this is the opportunity to revisit design decisions to facilitate evaluation. For example, an engineer might suggest a change in programming language for a component in order to get a 5 percent speed up. At this stage of the process, that proposal can be considered in the light of how it might influence assurance with respect to quality attributes, interface compliance, correct functionality, and other factors. The decision could be made not to change the programming language, but rather to incentivize the vendor to make the next set of improvements in its compiler technology. These decisions are made using a multi-criteria metric approach, with criteria and weightings informed by the earlier stages. Identify high-interest components. Regardless of the front end of the process, there will be a set of high-interest components. Ideally, however, as a result of architecture decisions, the components in this category are not also opaque and untrusted. Regardless, components are prioritized on the basis of measured assurance-related engineering risk, with metrics as set forth in the earlier stages. This assessment will account for ongoing improvements in development technologies (e.g., languages, environments, traceability and knowledge management), assurance tools (e.g., test, inspection, analysis, and monitoring support), and modeling (for various quality attributes including usability). Develop a component evaluation plan. Allocate resources, set priorities, and identify assurance requirements on the basis of the analyses above. In this preventive scenario, this plan is largely a consequence of the early decisions regarding architecture, sourcing, hazards, and functional scope. Metrics are defined for resolution of engineering risk in all components (but particularly high-interest components), so progress can be assessed and credit assigned. Assess individual components. As above, this involves a combination of many different kinds of techniques. In the preventive scenario, component development can be done in a way that delivers not only code, but also a body of evidence including test cases, analysis results, in-place instrumentation and probes, and possibly also proofs of the most critical properties. (These proofs are analogous to what is now possible for type-safety and encapsulation integrity, which is now a ubiquitous analysis that is composable and scalable.) This supporting body of evidence that is delivered with code enables acceptance evaluators to verify claims very efficiently regarding quality attributes, functionality, or other properties critical to assurance. Metrics are developed to support co-production of component code and supporting evidence. Select courses of action for custom components. See above. Select courses of action for opaque components and services. For existing vendor components, the same considerations apply as in the previous scenario. If new code is to be developed in a proprietary environment, then there is the challenge of how to make an objective case (not based purely on trust) that the critical properties hold. Existing approaches rely on mutually trusted third parties (as in Common Criteria), but there may be other approaches whereby proof information is delivered in a semi-opaque fashion with the code.29 Additionally, the proprietary developer could develop the code in a way that is designed to operate within a sandbox, in a separate process, or in another container—in this approach, the design is influenced by the need to tightly regulate control and data flows in and out of the contained component. Metrics would weight various criteria, with a long-term goal of diminishing the extent of reliance on trust vested in commercial vendors in favor of evidence production in support of explicit “assurability” claims. Refine system-level assessment. Given the high risks and costs of architectural change, in a preventive scenario, any adjustments to architecture are done incrementally as part of the overall process. Metrics would relate to the extent of architectural revisions necessary at each stage of the process. 29 There is a wealth of literature on proof-carrying code and related techniques.
OCR for page 110
Critical Code: Software Producibility for Defense Conclusion A key conclusion from these scenarios is the high importance of three factors: (1) The extremely high value of incorporating assurance considerations (including security considerations—see Box 4.1) into the full systems lifecycle starting with conceptualization, throughout development and acceptance evaluation, and into operations and evolution. (2) The strong influence of technology choices on the potential to succeed with assurance practices. (3) As a consequence, the value to DoD software producibility that comes from enhancements to critical technologies related to assurance, including both what is delivered (programming languages, infrastructure) and what is used during development (models and analytics, measurement and process support, tools and environments). Recommendation 4-1: Effective incentives for preventive software assurance practices and production of evidence across the lifecycle should be instituted for prime contractors and throughout the supply chain. This includes consideration of incentives regarding assurance for commercial vendor components, services, and infrastructure included in a system. As illustrated in the scenario, when incentives are in place, there are emerging practices that can make significant differences in the outcomes, cost, and risk of assurance. The experience at Microsoft with the Lipner-Howard Security Development Lifecycle (SDL)30 reinforces this—the lifecycle not only leads to better software but also incentivizes continuous improvement in assurance technologies and practices. When ecosystems, vendor components, open-source components, and other commerical off-the-shelf (COTS) elements are employed, assurance practices usually necessitate the DoD to constantly revisit selection criteria and particular choices. The relative weighting among the various sourcing options, from an assurance standpoint, will differ from project to project, based on factors including transparency of the development process and of the product itself, either to the government or to third-parties. This affords opportunity to create incentives for commercial vendor components to include packaged assurance-related evidence somewhere between the two poles of “as is” and “fully Common Criteria certified.” Advancement in research and practice could build on ideas already nascent in the research community regarding ways that the evidence could be packaged to support quality claims and to protect trade secrets or other proprietary technology embodied in the components. Recommendation 4-2: The DoD should expand its research focus on and its investment in both fundamental and incremental advances in assurance-related software engineering technologies and practices. This investment, if well managed, could have broad impact throughout the DoD supply chain. When both recommendations are implemented, a demand-pull is created for improved assurance practices and technologies. Recommendation 4-3: The DoD should examine commercial best practices for more rapidly transitioning assurance-related best practices into development projects, including contracted custom development, supply-chain practice, and in-house development practice. 30 Steve Lipner and Michael Howard, 2006, The Security Development Lifecycle: A Process for Developing Demonstrably More Secure Software, Redmond, WA: Microsoft Press.
OCR for page 111
Critical Code: Software Producibility for Defense Several leading vendors have developed explicit management models to accelerate the development of assurance-related technologies and practices, to validate them on selected projects, and to transition them rapidly into broader use.31 31 Microsoft is well known for its aggressive use of development practices including process (the Security Development Lifecycle (SDL) noted earlier—see http://msdn.microsoft.com/en-us/library/ms995349.aspx) and analysis tools (such as SLAM, PreFast, and others—see, for example Thomas Ball, 2008, “The Verified Software Challenge: A Call for a Holistic Approach to Reliability,” pp. 42-48 in Verified Software: Theories, Tools, Experiments, Bertrand Meye and Jim Woodcock, eds. Berlin: Springer-Verlag).