5
Category 2—Enabling Accountability

The goal of requirements in Category 2 of the committee’s illustrative research agenda is that of ensuring that anyone or anything that has access to a system component—a computing device, a sensor, an actuator, a network—can be held accountable for the results of such access. Enabling accountability refers to the ability to hold a party responsible for the consequences of its actions, and in particular that a consequence can be associated with appropriate parties if those actions cause harm. In this broad category are matters such as remote authentication, access control and policy management, auditing and traceability, maintenance of provenance, secure associations between system components, and so on.

5.1
ATTRIBUTION

Computer operations are inherently anonymous, a fact that presents many problems in cybersecurity. When a system is under remote attack, the attacker is generally unknown to the targeted system. When an attack has occurred, anonymous individuals cannot subsequently be held responsible and do not suffer any consequences for the harmful actions that they initiated. And, if all users of a system are anonymous, there is no way to differentiate between authorized and unauthorized actions on a system.

Attribution is the ability to associate an actor with an action. (By contrast, authentication refers to establishing the truth of some claim of



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 113
Toward a Safer and More Secure Cyberspace 5 Category 2—Enabling Accountability The goal of requirements in Category 2 of the committee’s illustrative research agenda is that of ensuring that anyone or anything that has access to a system component—a computing device, a sensor, an actuator, a network—can be held accountable for the results of such access. Enabling accountability refers to the ability to hold a party responsible for the consequences of its actions, and in particular that a consequence can be associated with appropriate parties if those actions cause harm. In this broad category are matters such as remote authentication, access control and policy management, auditing and traceability, maintenance of provenance, secure associations between system components, and so on. 5.1 ATTRIBUTION Computer operations are inherently anonymous, a fact that presents many problems in cybersecurity. When a system is under remote attack, the attacker is generally unknown to the targeted system. When an attack has occurred, anonymous individuals cannot subsequently be held responsible and do not suffer any consequences for the harmful actions that they initiated. And, if all users of a system are anonymous, there is no way to differentiate between authorized and unauthorized actions on a system. Attribution is the ability to associate an actor with an action. (By contrast, authentication refers to establishing the truth of some claim of

OCR for page 113
Toward a Safer and More Secure Cyberspace identity.) The actor is characterized by some attribute(s), such as the name of a user, the serial number of a machine on a network, or some other distinguishing property of the actor. Attribution requires technology that is less inherently anonymous so that association between action and actor is easily ascertained, captured, and preserved. Attribution should be conceptualized with respect to five important characteristics: Precision. A single attribute may uniquely characterize an actor, as might be the case with the complete genome sequence corresponding to a specific human being or the manufacturer’s serial number on a given machine. But such attributes are by far the exception. Individuals may have the same name; the Media Access Control (MAC) address of a specific network device may not be unique, and even a human being may have an identical twin (whose genomic sequence will be identical in all respects to that of the first human being). Accuracy. A characteristic related to precision is accuracy, a measure of the quality of attribution, such as the probability that the attribution is correct (i.e., that the value of the attribute is indeed associated with the actor in question). Accuracy is a key issue in legal standards for evidence and in the extent to which it is reasonable to develop linkages and inferences based on those attributes. Lifetime/duration. As a rule, an association (which generally consists of the actor’s attribute, the action, the object acted on, and other relevant data such as the time of the action) need not be preserved forever. For example, a statute of limitations applies to many associations, after which the association can often be discarded. But this example also points out that the duration of preservation depends on the purpose being served. From a legal standpoint, it may be safe to discard the association. But what may be safe from a legal standpoint may not make sense for business reasons (e.g., a business may need to reconstruct what happened in a project long ago), and conversely as well. Granularity. As a general rule, an action consists of a number of components in a certain sequence. For some purposes, it may be sufficient to make attributions about the action at the highest level (that is, at the level of complete transaction). For example, it may be necessary to determine that an operating system patch came from the operating system manufacturer. However, there may be times when an entity contemplating accepting or executing an action may want to make attributions on individual components of a transaction. Perhaps, in a financial transaction, a gross total would

OCR for page 113
Toward a Safer and More Secure Cyberspace be attributed to a valid counterparty, but the tax implications might be attributed to a tax lawyer. For instance, one could research the possibility of having different attributions associated with the various results of network service invocations. While complex, this is related to the large body of work on transitive, or delegated, trust. In the first instance, the operating system manufacturer trusts its employees and the operating system patch installer trusts the manufacturer. In the example of the financial transaction, the trust relationship is explicitly broken out among the individual components of the transaction. Security (specifically, resistance of an attribution to attack and spoofing). Attribution depends on the inability to break the association between action and actor, because in its absence, impersonation can easily occur. These five characteristics vary depending on the application. For example, for operational defense, duration may be very short, measured in seconds or minutes; for forensics investigation, duration may be measured in years. There are also a number of systems-level issues for the implementers and/or the operators of attribution-capable systems. For example, where should be the locus of responsibility for the implementation of attribution mechanisms? An operator of a system or network may expect that attribution will be built in to system or network actions. But in a decentralized environment in which many vendors are responsible for providing one component service or another, the party responsible for implementing attribution mechanisms may be difficult to identify (or to hold accountable for such implementation). Note that attribution may be an issue at all levels of a system design (the individual and organization at high levels, the computers or applications at low levels). Another systems-level issue is the privacy of attribution information. Attribution information can be very sensitive, and thus must be protected against unauthorized or improper disclosure. Similar considerations apply to parties that are allowed to request that attribution be obtained in the first place.1 The most important cybersecurity issue associated with attribution is a problem that attribution mechanisms cannot solve—the unwittingly 1 This point raises the issue of how attribution is designed into a system. Under some designs and for some applications, all actions might routinely be attributed and the information stored in a secure database, to be divulged only to parties that provide proper authorization. Under other applications (perhaps applications that are more privacy-sensitive), actions might be attributed only under explicit authorization.

OCR for page 113
Toward a Safer and More Secure Cyberspace compromised or duped user. As the existence of botnets illustrates, a cyberattacker has many incentives to compromise others into doing his or her dirty work. Even in the instances when attribution mechanisms operate perfectly, they may well identify a cyberattack as originating from a computer belonging to an innocent little old lady from Pasadena. Put differently, there is a big difference between identifying the source or sources of a cyberattack and associating with that attack the name of a human being or beings responsible for launching it. This is not to say that making such an identification is useless—indeed, it may be an essential step in a forensic investigation—and it is worthwhile to make such steps as easy as possible. And, the widespread deployment of attribution mechanisms may increase the likelihood that the perpetrator of any given attack can be identified. Assuming that identifying the launch point of an attack is possible, such identification could be used in operational defense to identify the source of a remote attack. Such identification is a necessary (though not sufficient) condition for being able to shut off or block the attack in real time at the source. Two such attacks are a distributed denial-of-service attack and the theft—while it is happening—of a large proprietary (or “trade-secret”) digital object. In this case, the objective is to block the compromise in real time, and false positives (that misidentify the attacker) are of less consequence than failure to identify the attack at all. An area related to attribution that warrants further exploration is the automated capture, maintenance, and use of “information provenance.” Provenance is a sequence of attributes that in some way specifies trustworthy information relating to the initial creation and every subsequent modification of some information unit or collection of information units (e.g., a file, an e-mail, and so on). An important characteristic of provenance is that it would be maintained on information across distributed systems; for example, it would flow with an object. There are many possible uses of provenance. For example: A computer program may possess a provenance that in some ways specifies who was involved in its creation. This could solve many problems—for example, finding out which programs may have been written or modified by an individual who is later found out to be untrustworthy. Today, some aspects of provenance may be maintained in a source control system, but usually not in a highly trustworthy fashion. Just as with antiques, provenance would tend to provide a greater ability to interpret where information came from, and this may shed light on the value of the information. With the proliferation of information of all types including images, it is increasingly dif-

OCR for page 113
Toward a Safer and More Secure Cyberspace ficult to separate fact from fiction. For example, a picture with provenance indicating that there has been no modification beyond its initial imaging and also its association with the New York Times newsroom might well be more trustworthy than a picture that has been postprocessed and associated with a tabloid. E-mail with provenance may enable increased trust of that e-mail. While provenance will by no means prevent the transmission of spam or viruses, the knowledge of the provenance of a forwarded e-mail note (some attributes of the author, his or her computer, any modifiers in a forwarding path, and so on) would provide some confidence to the recipient and would certainly provide forensic benefits in tracking down cyberattackers. Provenance for e-mail could also help to address today’s problems of anonymous harassing e-mails, since a sender could be more readily identified. Databases implementing provenance could provide a user with the ability to easily determine the data elements that contributed to a given result. This ability might well contribute to the confidence that the user has in that result or might suggest new and fruitful lines of inquiry. There would seem to be significant research related to utilizing provenance to make systems and information more secure, as one element of security (or more precisely, confidence in security) is knowing the detailed lineage of any given system and its components. There are also complex and highly interesting questions relating to the implementation of provenance. For example, there are questions as to how one can provide systems support for an extensible set of attributes, how those attributes can be associated reliably and immutably with their corresponding information, how performance issues associated with a large list of attributes can be contained, how to surface provenance information via programmatic interfaces, and how one can handle the coalescing of attributes so that the attribute lists do not grow without bound. It seems likely that storage of attributes would be benefited by the existence of a trusted computing base that would use virtualization to ensure sufficient isolation. Finally, there are fascinating questions as to how to make provenance valuable to users. Given the massive increase in the amount of attribute data available, there are interesting questions as to how to surface it in ways so that the valuable provenance stands out. There is the possibility that significantly useful, application-specific heuristics will be created that can monitor provenance and detect potential problems. Analysis must also be done on the impact of provenance on privacy.

OCR for page 113
Toward a Safer and More Secure Cyberspace As an example of research in data provenance, Margo Seltzer has undertaken work on provenance-aware storage systems (PASS).2 Seltzer points out that although the chain of ownership and the transformations that a document has undergone can be important, most computer systems today implement provenance-related features as an afterthought, usually through an auxiliary indexing structure parallel to the actual data. She argues that provenance is merely a particular type of metadata, and thus that the operating system itself should be responsible for the automatic collection and management of provenance-relevant metadata just as it maintains conventional file system metadata. And, it should support queries about that metadata. An extension of a provenance-aware system, more difficult to implement, would enable queries to be made about entities smaller than a file, such as the individual cells of a spreadsheet or particular paragraphs in a document. Progress in attribution research increases the ability to provide provenance for electronic information or events (Cybersecurity Bill of Rights Provision V), is an integral element of expunging information (Provision IV), inhibits an attacker’s ability to perform denial-of-service attacks (Provision I), and improves the ability to audit systems performing certain critical functions (Provision VII). 5.2 MISUSE AND ANOMALY DETECTION SYSTEMS Misuse and anomaly detection (MAD) systems refer to a fairly wide range of systems and techniques for detecting suspicious or anomalous activity on (or intrusion into) computers, servers, or networks.3 Intrusions are most often classified either as misuse (i.e., an attack) or as an anomaly. In general, there are two primary types of MAD systems in use today in organizations large and small: Host-based MAD systems. These systems operate on a specific host or computer to detect suspicious activity on that particular host—for example, malicious connection attempts or applications doing things that they should not be doing (e.g., a word processor 2 See http://www.eecs.harvard.edu/~margo/research.html. 3 For more detailed information on ID systems and related issues, see Rebecca Bace, undated, “An Introduction to Intrusion Detection and Assessment for System and Network Security Management,” ICSA Labs white paper, available at http://www.icsa.net/icsa/docs/html/communities/ids/whitepaper/Intrusion1.pdf; and Karen Kent and Peter Mell, 2006, “Guide to Intrusion Detection and Prevention (IDP) Systems (Draft), Recommendations of the National Institute of Standards and Technology” (NIST Special Publication 800-94), National Institute of Standards and Technology, Gaithersburg, Md., available at http://csrc.nist.gov/publications/drafts/Draft-SP800-94.pdf.

OCR for page 113
Toward a Safer and More Secure Cyberspace trying to modify key operating system or configuration files); and Network-based MAD systems. These systems focus on network data flows, looking for suspicious packets or traffic. Often these two types of systems are used together to create a hybrid solution for misuse or anomaly detection. Indeed, each by itself is quite limited. MAD systems are potentially valuable in that they seek to detect the early stages of an attack (e.g., an attacker’s probing of a machine or network for specific vulnerabilities) and can then aid in protecting a machine from (or even preventing) the subsequent stages of the attack. MAD systems also seek to detect telltale signs of suspicious activity or patterns of behavior (whether by a user, an application, or a piece of malicious code) that firewalls or other tools might miss or ignore. MAD systems are generally quite complex and require significant effort to manage properly. They are not a fix-all solution for computer or network security; MAD systems cannot compensate or account for weaknesses such as design flaws and software bugs, and cannot compensate or account for weaknesses in organizational authentication policies, data management practices, or network protocols themselves. From a technical standpoint, one of the most significant difficulties of developing usable MAD systems is the fact that the behavior of an intruder may be nearly indistinguishable from that of a legitimate user; intruders often take great care to make their behavior look innocuous. For instance, MAD systems are “trainable” by attackers. A patient attacker can gradually increase the incidence of events to be later associated with an attack to the point where the MAD system ranks them as “normal,” whereas springing the specific events on the system would cause it to alarm. As a result, when MAD systems are made very sensitive, they are notorious for generating many false positives (sounding alarms when none are warranted) and thereby inconveniencing legitimate users; when they are made less sensitive in order to avoid inconveniencing legitimate users, they are notorious for failing to sound alarms when intruders or misuse is in fact present. An aggravating factor is that attackers are constantly at work devising and refining ways to elude known MAD systems—for example, using so-called “stealthy” scans to avoid the notice of some MAD systems. Reconciling the tension between false positives and false negatives is thus a central area of MAD system research. Another challenge in the development of MAD systems is that of finding methods that function efficiently in large systems. Many approaches to misuse and anomaly detection generate enormous amounts of data, which must subsequently be analyzed. (In the extreme case, an audit

OCR for page 113
Toward a Safer and More Secure Cyberspace log that allows the reconstruction of a user’s activities is a MAD system that only collects data; automated tools for log analysis that search for suspicious patterns of behavior can then be regarded as a kind of post hoc MAD system.) Moreover, the collection and analysis of such large amounts of data may degrade performance to unacceptable levels, suggesting that a hierarchical abstraction process may be needed for more efficient performance.4 Related is the challenge of integrating MAD systems with network infrastructure itself, making MAD a standard feature in some deployments. In addition, MAD systems must address the very difficult problem of uncovering possible patterns of misuse or anomalies that may occur in a distributed manner across the systems of a large network. That is, certain behavior may not be suspicious if and when it occurs in isolation, but the identical behavior may well be suspicious if it occurs on multiple systems at the same time. Today, understanding how to correlate behavior that is non-anomalous in the small to infer an indication of anomalous behavior in the large is quite problematic. The problems are even more severe in an environment in which qualitatively different exploitations might be occurring in different systems orchestrated by a single hostile actor. Despite more than two decades of research in this area, significant problems remain concerning the interpretation of the audit and network packet data, in particular, involving the early recognition of patterns of multiple simultaneous attacks or outages, identifying the sources and identities of attackers, and discerning the intent of the attacks.5 Privacy problems must also be addressed, because the audit and network packet data can contain sensitive information.6 Progress in MAD system research supports Provision I, Provision III, Provision IX, and Provision X of the Cybersecurity Bill of Rights. 4 P.A. Porras and P.G. Neumann, “EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances,” in Proceedings of the Nineteenth National Computer Security Conference, NIST/NCSC, Baltimore, Md., pp. 353-365, October 22-25, 1997; and P.G. Neumann and P.A. Porras, “Experience with EMERALD to Date,” in Proceedings of the First USENIX Workshop on Intrusion Detection and Network Monitoring, USENIX, Santa Clara, Calif., pp. 73-80, April 1999, available at http://www.csl.sri.com/neumann/det99.html. 5 P.A. Porras and P.G. Neumann, “EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances,” in Proceedings of the Nineteenth National Computer Security Conference, NIST/NCSC, Baltimore, Md., pp. 353-365, October 22-25, 1997; and P.G. Neumann and P.A. Porras, “Experience with EMERALD to Date,” in Proceedings of the First USENIX Workshop on Intrusion Detection and Network Monitoring, USENIX, Santa Clara, Calif., pp. 73-80, April 1999, available at http://www.csl.sri.com/neumann/det99.html. 6 Phillip A. Porras, “Privacy-Enabled Global Threat Monitoring,” IEEE Security and Privacy, November-December 2006, pp. 60-63.

OCR for page 113
Toward a Safer and More Secure Cyberspace 5.3 DIGITAL RIGHTS MANAGEMENT Digital rights management (DRM) refers to the granting of various privileges depending on the identity of the party who will use those privileges. A common example is the management of privileges for protected content—a publisher may choose to sell the right for an individual to watch a (digital) movie once or an unlimited number of times, but in any case, only to watch it and not to forward or copy it. Unlike physical objects, if a computer can read some bits (as would be necessary to convert those bits into a human-sensible form like music or pictures), then that computer will also be able to copy those bits an unlimited number of times. Providers want recipients to abide by certain terms of use specified in a contract and want technical assurances that the contract will be enforced. Moreover, permission to copy the bits of a protected work is unlikely to be part of a contract that restricts the use of those bits, since the copies can be used or further distributed for use in ways that do not comply with the contract terms. Thus, a means of enforcement is needed to constrain what is done with the bits. Such enforcement requires software that can be trusted by the provider even though it is executed on a machine that is not similarly trusted. Since computers are universal—and therefore a computer can simulate any other—the trusted software could well be running in a software simulator rather than directly on the hardware (unless special-purpose hardware is being used). Universality of digital computers is thus problematic, because when the trusted software is run in a simulator, the simulator could make illicit copies of an electronic copy without the trusted software’s knowledge of this copying; the illicit copies can then be subsequently used to violate the terms of the use agreement. Thus, solving the DRM problem is more than a problem of ensuring confidentiality for the content in question—the problem is bigger than how to transmit the electronic content from the owner to the customer in a way that prevents interception by third parties. It is also a problem of (what has come to be known as) trusted computing: how to build a computing environment in which the user is not trusted to control certain aspects of its configuration and operation but rather a programmer is trusted to do this. Recent hardware extensions, such as the Trusted Platform Module (TPM) (see Section 4.1.2.1), can be seen as providing support for exactly this trust and execution model. But TPM and such solutions are not a panacea—many consumers would find it unacceptable to own a general-purpose computer over which they themselves do not have complete control (having ceded some control to the programmers of certain trusted software). So there is a tension between computer owners who feel that

OCR for page 113
Toward a Safer and More Secure Cyberspace they have lost control over their computers and the desire of content providers to enforce their content-usage contracts. Moreover, DRM schemes may enforce the rights of content owners at the expense of eroding the rights of content users. The most salient example of such erosion is the impact of DRM on fair use, which is the principle allowing the small-scale use of limited amounts of copyrighted materials for certain limited purposes.7 Some DRM implementations eliminate fair use because of the difficulty in algorithmically distinguishing between fair use and illegal copying. Another problem is that DRM schemes often force the user into using the content on only one device—use on a second device requires a second copy. The overall effect of such implementations, especially in the long run, has important public policy implications that are as yet poorly understood.8 The economic model for DRM rests on the premise that illegal copies of a work deprive the content owner of the revenues that would be associated with the legal sale of those copies. There is some merit to this claim, and yet it is not the only factor in play. For example, estimating lost revenues in this fashion surely overstates the revenue loss, since some of the copies distributed illegally would be acquired by parties who would not have paid for legal copies in the absence of the illegal copies. Also, by some accounts, unprotected digital content can spur rather than impede sales of that content. These points suggest that the net outcome of widespread DRM implementation is uncertain, and thus the long-term economic rationale for these DRM schemes is poorly understood. Still another issue with DRM is that DRM technology is usually designed with a failure mode that defaults to “deny access.” That is, because DRM technology generally serves the interests of content owners rather than of content users, the operating principle for DRM is to deny access to the content unless the user can provide appropriate authorization for access. Thus, DRM itself introduces a potential security vulnerability to a denial-of-service attack that can be exploited by an adversary clever enough to interfere with the authorization mechanisms involved. 7 Fair use is defined by statute in Sections 107 through 118 of Title 17 of the U.S. Code. See http://www.copyright.gov/title17/92chap1.html. 8 A hardware-based approach is not the only possible approach to digital rights management. Another approach is based on accountability. A content-usage contract could be enforced legally by embedding into every legitimate copy of electronic content a unique identifier (known as a watermark). If an illegal copy is discovered, the embedded identifier can be used to identify the original owner, who has presumably allowed the original version to be copied in violation of the content-usage contract. However, this approach fails if the user can remove the watermark before copying occurs, and there is no reason to believe that it is possible to develop an unremovable watermark. In addition, the identified user could claim that the content was stolen. Finally, this approach requires individual prosecution for every illegal copy found—a major disadvantage when widespread copying is at issue.

OCR for page 113
Toward a Safer and More Secure Cyberspace Although the most common use today of DRM is the protection of copyrighted works that are sold for profit, the philosophy underlying DRM—that content providers should have the ability to exercise fine-grained control over how their content is used—can be used to support individuals in protecting their own documents and other intellectual property in precisely the same ways. For example, A may wish to send a sensitive e-mail to B, but also to insist that B not print it or forward it to anyone else. Some DRM systems are available today that seek to provide controls of this nature within the boundaries of an enterprise. This kind of DRM application operates in an environment very different from a copyright-enforcement regime. In a copyright-enforcement regime, the primary concern is preventing the improper large-scale distribution of copyrighted works, whereas the concerns in an enterprise DRM regime are more varied (e.g., individuals may have more concerns about the time periods during which content may be available). Because the particular set of rights relevant to any given recipient is more varied, users must specify in detail the rights they wish to grant to content recipients. Although default settings ease the burden, many users still find enterprise DRM systems cumbersome and clumsy from a usability standpoint. In addition, because the scale of rights enforcement is necessarily much more fine-grained (one improperly forwarded e-mail can become very problematic), there are higher premiums and greater needs for protections against actions such as “screen scraping” as a way of obtaining machine-readable content in violation of the rights mechanism. Finally, both sender and recipient must generally operate within the same enterprise—usually, a sender who wants to engage a recipient outside the enterprise does not have the functionality afforded by the DRM system.