Unifying Disparate Tools in Software Security
How much can you trust software? When you install a piece of code, such as a video game or a device driver for a new camera, how can you be sure the code won’t delete all of your files or install a key-stroke logger that captures your passwords? How can you ensure that the software doesn’t contain coding bugs or logic errors that might leave a security hole?
Traditional approaches to software security have assumed that users could easily determine when they were installing code and whether or not software was trustworthy in a particular context. This assumption was reasonable when computers controlled few things of real value, when only a small number of people (typically experts) installed software, and when the software itself was relatively small and simple. But the security landscape has changed drastically with advances in technology (e.g., the explosive growth of the Internet, the increasing size and complexity of software) and new business practices (e.g., outsourcing and the development of open-source architecture). Furthermore, the more we rely on software to control critical systems, from phones and airplanes to banks and armies, the more desperately we need mechanisms to ensure that software is truly trustworthy.
Malicious hackers have serious financial incentives to break into and control personal and business computers. Once they break into a machine, they can
gather personal information (e.g., passwords, credit card information, and social security numbers) and use it to clean out bank accounts, apply for credit cards, or gain access to other machines on the same network. In addition, if attackers break into and control machines, they can use them to send “spam” (unsolicited e-mail advertisements), to store illicit digital goods (e.g., pirated music or pornography), or to launch denial-of-service attacks against other computing systems.
One technique attackers use to gain a foothold on a machine is to trick the user into thinking a piece of code is trustworthy. For example, an attacker may send an e-mail that contains a malicious program, but with a forged return address from someone the user trusts. Alternatively, an attacker may set up a website with a name that appears to be trustworthy, such as “www.micr0s0ft.com” (note that the letter “o” has been replaced by the numeral “0”), and send an e-mail with a link to the site suggesting that the user download and install a security update.
Even sophisticated users can be fooled. A clever attacker might contribute “Trojan horse” code to a popular open-source project, turning an otherwise useful program (e.g., a music player) into undetected mal-ware.1 In short, it is relatively easy for an attacker to gain the trust of a user and get him or her to install code.
Today we use a combination of digital signatures and virus scanners to try to stop these attacks. But digital signatures only tell us who produced the code, not whether it is trustworthy. And virus scanners, which look for snippets of previously identified mal-ware at the syntactic level, can be easily defeated through automated obfuscation techniques. Furthermore, virus scanners cannot detect new attacks. These relatively weak tools can tell us something about the provenance of software and its syntax, but they cannot validate the software’s semantics.
EVEN GOOD GUYS CAN’T BE TRUSTED
A second technique attackers use to gain control of a machine is to find a bug in an already installed program that communicates with the outside world. The classic example is a buffer overrun in a service, such as a log-in daemon, or a network-based application, such as a web browser. A buffer overrun occurs when a program allocates n bytes of memory to hold an input (e.g., a password), but the attacker provides more than n bytes. A properly written program will check and reject any input that is too long. Unfortunately, many programmers fail to insert appropriate checks, partly because commonly used programming languages (C and C++) make it easy to forget those checks, and partly because programmers often think that “no one will have a password of more than a thousand characters.”
When programmers fail to put in a check and the input is too long, the extra bytes overwrite whatever data happen to be stored next to the input buffer. In many cases, the buffer is allocated on the control stack near a return address
for a procedure—a location where the program transfers control once the input routine is completed.
A clever attacker will enter an input long enough to overwrite the return address with a new value, thereby controlling which code is executed when the input routine ends. In an ideal attack, the rest of the input contains executable instructions, and the attacker causes control to transfer to this newly injected code. In this way, the attacker can cause the program to execute arbitrary code.2
Attacks based on buffer overruns are surprisingly common; at one point, they accounted for more than half of the security vulnerabilities reported by the Computer Emergency Response Team (Wagner et al., 2000). But the failure to check input lengths is only one of a large number of coding errors attackers use to gain a foothold on a machine or extract private information. Other examples include integer overflows; format-string attacks; script-injection attacks; race conditions; bad, pseudo-random number generators; and improper use of cryptographic primitives.
Thus, even well-meaning software vendors cannot seem to produce trustworthy code. Part of the problem is that vendors must keep producing new features to sell new versions of software, which simply adds complexity to the code base. And part of the problem is that current development practices, including design, coding, review, and testing, cannot rule out simple errors, such as buffer overruns, much less deeper problems, such as the inappropriate use of cryptographic primitives or covert-information channels.
PREVENTING BUGS THROUGH REWRITING
One way to defend against attacks is to build tools that automatically insert checks at program locations where a policy violation might occur. For example, we might insert extra code at each buffer update to ensure that we do not write beyond its bounds.
Compilers for high-level, type-safe languages, such as Java and C#, already insert checks to prevent a large category of language-level errors, including buffer overruns. However, the vast majority of systems and application software, including security-critical operating-systems code, is still written in C or C++, so the compiler does not have enough information to insert appropriate checks.
Unfortunately, the cost of rewriting the millions of lines of code that make up big software systems (e.g., Windows or Oracle) is prohibitive, and doing so is likely to introduce as many bugs as it eliminates. Furthermore, because certain services, such as device drivers and real-time embedded software, need control over memory layout and timing, they are not suited to high-level languages.
Some effective tools for rewriting low-level legacy code are emerging. For example, the StackGuard tool (Cowan et al., 1998) and the Microsoft “/gs”
For a more detailed explanation, see <http://www.phrack.org/archives/49/P49-14/>.
compiler option rewrite C/C++ code to insert a secret “cookie” next to buffers allocated on the control-flow stack and check that the cookie has been preserved when a procedure is about to jump to a return address. In this way a buffer overrun can often be detected and stopped with relatively low overhead. Unfortunately, this technique does not protect against other forms of attack, such as overflowing a buffer allocated on the heap.
Another example of automated rewriting is the CCured compiler (Necula et al., 2002). CCured provides a strong type-safety guarantee by inserting extra metadata and run-time checks into C code. Metadata make it possible to determine, for instance, the size of a buffer at run time, and the checks ensure that all buffer boundaries and typing constraints are satisfied.
Although CCured can entail higher overhead costs than StackGuard, the security guarantees are much stronger. Nevertheless, the advantage of StackGuard is that it can be applied to almost any C program without change, whereas CCured requires a number of changes to the source code. In practice, the tool that is easier to use is almost always used first.
Many other tools also try to enforce security policies on legacy code through rewriting. These include software-based fault isolation (McCamant and Morrisett, 2006; Wahbe et al., 1993), control-flow isolation (Abadi et al., 2005), in-lined reference monitors (Schneider, 2000), and others. Each of these tools strikes a different balance among the expressiveness of the policies that can be enforced, the code base to which the techniques are applicable, and run-time overhead.
PREVENTING BUGS THROUGH STATIC ANALYSIS
Security-conscious companies are beginning to use tools that either prevent or detect a wide range of coding errors at compile time. Microsoft, for example, uses a static analysis tool called Prefast (based on an early tool called Prefix) that scans C and C++ code, looking for common errors, such as buffer overruns (Bush et al., 2000). Other companies, such as Fortify and Coverity, produce similar tools that can analyze software written in a variety of languages and support customizable rule-sets for finding new classes of bugs as they arise.
A static analysis tool attempts to symbolically execute all paths in the code, looking for potential bugs. Of course, a program with n conditional statements can have 2n distinct paths, and a program with loops or recursion can have an unbounded number of paths. Thus the static analysis approach is both naive and infeasible. Furthermore, the analysis does not know the actual values of inputs to the program, so it cannot always determine the value of variables or data structures.
Consequently, static analysis tools construct models of the program that abstract details and reduce the reasoning to finite domains. For example, instead of tracking the actual values that integer variables might take on, an analysis might track only upper and lower bounds. To ensure termination, the analysis might
only consider a few iterations of a loop. More sophisticated techniques based on abstract interpretation (Cousot and Cousot, 1977) make it possible to determine an approximation of behavior for all iterations.
Although automated static analysis has improved tremendously, especially in the past few years, a number of problems remain to be solved. One problem, introduced by the necessary approximation described above, is false positives (i.e., potential errors may be indicated where there are none) or false negatives (i.e., errors that are not reported).
If the approximations are constructed so that the analysis is sound (i.e., no false negatives), programmers tend to be flooded with false positives, making it difficult to find and fix real bugs in the program. Therefore, few of these analyses are sound, and bugs can potentially sneak through. For example, a buffer overrun recently found in the Windows Vista custom-cursor animation code was not detected by Prefast.
LOOKING TO THE FUTURE
One problem with static analysis tools is that, like optimizing compilers, they tend to be large, complicated programs. As a result, a bug in the analysis can mean that a security-relevant error goes undetected. Furthermore, because most tools operate on source code (or byte code), they must make assumptions about how the compiler will translate the program to machine code. Consequently, an inconsistent assumption or a bug in the compiler can result in a security hole.
An ideal static analysis would fit the following description:
The analysis can operate at the machine-code level (to avoid having to reason about the compiler).
The analysis is small, simple, and formally verified.
The analysis is sound (i.e., no false positives).
The analysis is complete (i.e., no false negatives).
Although these goals may seem unattainable, the proof-carrying code (PCC) architecture suggested by Necula (1997) comes remarkably close to satisfying them. The idea behind PCC is to require that executable programs come equipped with a formal, machine-checkable “proof” that the code does not contain bugs. By “proof” we mean a formal, logical argument that the code, when executed, will not violate a pre-specified (and formalized) policy on behavior.
The key insight of PCC is that constructing a proof is hard (generally undecidable), but verifying a proof is easy. After all, a proof is meant to be mechanically checkable evidence. Furthermore, assuming we have a sound, relatively complete logic for reasoning about program behavior, we can, in principle, accept precisely those programs that are provably safe.
The really nice thing about PCC is that, as a framework, it benefits both the
user who is worried about the safety of installing a program downloaded from the Internet and the software producer who has outsourced the development of modules. With PCC, we are no longer dependent on who produces the code but on what the code actually does.
In practice, the problem with PCC is that someone, namely the code producer, still has to come up with proof that the code satisfies a given policy. Thus none of the difficult problems has really been eliminated. We still need other techniques, such as static analysis and rewriting, to find an effective way to construct the proof. However, with PCC we no longer have to trust the tools or compiler. Therefore PCC helps minimize the size of the trusted computing base.
One way to construct a proof that machine code satisfies a policy is by using a proof-preserving compiler. For example, if we start with source code and a proof that the source satisfies a policy, then a proof-preserving compiler can simultaneously transform the code and proof to the machine-code level. Type-preserving compilers are a particular instance in which the policy is restricted to a form of type safety (Colby et al., 2000; Morrisett et al., 1999; Necula and Lee, 1998).
Next-generation programming languages are incorporating increasingly sophisticated type systems that allow programmers to provide stronger safety and security properties through automated type checking. For example, the Jif dialect of Java lets programmers specify secrecy requirements for data, and the type system ensures that secret inputs (i.e., private fields) cannot flow to public outputs (Myers, 1999). Thus, when combined with a type-preserving compiler, the Jif-type system makes it possible for a third party to verify that a program will not disclose information that is intended to be private.
At an extreme, arbitrary policies can be captured with a dependent type-system, as found in emerging languages, such as ATS (Xi, 2004), Concoqtion (Fogarty et al., 2007), Epigram (McBride, 2004), and HTT (Nanevski et al., 2006). These languages make it possible to specify everything from simple typing properties to full correctness. The trade-off, of course, is that type checking is only semiautomatic. Simple properties can be discharged using a combination of static analysis, constraint solving, and automated theorem proving, but ultimately, programmers must construct explicit proofs for deeper properties.
Thus the success of these languages will be determined largely by how much can truly be automated. Nevertheless, for safety and security-critical software systems, these languages, in conjunction with PPCs and the proof-carrying code architecture, provide a compelling research vision.
Abadi, M., M. Budiu, U. Erlingsson, and J. Ligatti. 2005. Control-flow integrity: Principles, implementations, and applications. Pp. 340-353 in Proceedings of the 12th ACM Conference on Computer and Communications Security (CCS’05). New York: ACM Press.
Bush, W., J. Pincus, and D. Sielaff. 2000. A static analyzer for finding dynamic programming errors. Software—Practice and Experience 30(7):775-802.
Colby, C., P. Lee, G. C. Necula, F. Blau, K. Cline, and M. Plesko. 2000. A certifying compiler for Java. Pp. 95-107 in Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM Press.
Cousot, P., and R. Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. Pp. 238-252 in Proceedings of the Symposium on Principles of Programming Languages. New York: ACM Press.
Cowan, C., C. Pu, D. Maier, H. Hinton, J. Walpole, P. Bakke, S. Beattie, A. Grier, P. Wagle, and Q. Zhang. 1998. StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks. Pp. 63-78 in Proceedings of the 7th USENIX Security Symposium. Berkeley, CA: USENIX Association.
Fogarty, S., E. Pasalic, J. Siek, and W. Taha. 2007. Concoqtion: Indexed types now! Pp. 112-121 in Proceedings of the 2007 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation. New York: ACM Press.
McBride, C. 2004. Practical programming with dependent types. Pp. 130-170 in Lecture Notes in Computer Science 3622, V. Vene and T. Uustalu, eds. New York: Springer-Verlag.
McCamant, S., and G. Morrisett. 2006. Evaluating SFI for a CISC architecture. Pp. 209-224 in Proceedings of the 15th USENIX Security Symposium. Berkeley, CA: USENIX Association.
Morrisett, G., D. Walker, K. Crary, and N. Glew. 1999. From system F to typed assembly language. ACM Transactions on Programming Languages and Systems 21(3):527-568.
Myers, A. C. 1999. Practical mostly-static information flow control. Pp. 228-241 in Proceedings of the ACM Symposium on Principles of Programming Languages. New York: ACM Press.
Nanevski, A., G. Morrisett, and L. Birkedal. 2006. Polymorphism and separation in hoare type theory. Pp. 62-73 in Proceedings of the ACM International Conference on Functional Programming. New York: ACM Press.
Necula, G. C. 1997. Proof-carrying code. Pp. 106-119 in Proceedings of the ACM International Symposium on Principles of Programming Languages. New York: ACM Press.
Necula, G. C., and P. Lee. 1998. The design and implementation of a certifying compiler. Pp. 333-344 in Proceedings of the ACM Conference on Programming Language Design and Implementation. New York: ACM Press.
Necula, G. C., S. McPeak, and W. Weimer. 2002. CCured: Type-safe retrofitting of legacy code. Pp. 128-139 in Proceedings of the ACM Symposium on Principles of Programming Languages. New York: ACM Press.
Schneider, F. B. 2000. Enforceable security policies. ACM Transactions on Information Systems Security 3(1):30-50.
Thompson, K. 1984. Reflections on trusting trust. Communications of the ACM 27(8):761-763.
Wagner, D., J. S. Foster, E. A. Brewer, and A. Aiken. 2000. A first step towards automated detection of buffer overrun vulnerabilities. Pp. 3-17 in Network and Distributed System Security Symposium. Reston, VA: Internet Society.
Wahbe, R., S. Lucco, T. E. Anderson, and S. L. Graham. 1993. Software-based fault isolation. ACM SIGOPS Operating Systems Review 27(5):203-216.
Xi, H. 2004. Applied type systems (extended abstract). Pp. 394-408 in Lecture Notes in Computer Science 3085. New York: Springer-Verlag.