This report of the Committee on Responding to Section 5(d) of Presidential Policy Directive 28: The Feasibility of Software to Provide Alternatives to Bulk Signals Intelligence Collection responds to a request to the National Academies from the Office of the Director of National Intelligence (ODNI). That request, in turn, was occasioned by Presidential Policy Directive 28 (PPD-28) Section 5(d), which had asked the Director of National Intelligence for “a report assessing the feasibility of creating software that would allow the Intelligence Community (IC) more easily to conduct targeted information acquisition rather than bulk collection [of signals intelligence].”1 This study is among several of the administration’s responses to heightened public concern about U.S. intelligence agency surveillance programs that followed Edward Snowden’s disclosure of numerous internal National Security Agency (NSA) documents beginning in mid-2013. These responses include other activities called for in PPD-28 as well as in a study of big data and privacy by the President’s Council of Advisors on Science and Technology that is largely focused on civilian applications.2
1 The White House, Presidential Policy Directive/PPD-28, “Signals Intelligence Activities,” Office of the Press Secretary, January 17, 2014, http://www.whitehouse.gov/sites/default/files/docs/2014sigint_mem_ppd_rel.pdf.
2 President’s Council of Advisors on Science and Technology, Big Data and Privacy: A Technological Perspective, Executive Office of the President, May 2014, http://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_big_data_and_privacy_-_may_2014.pdf.
CONTEXT AND DEFINITIONS
PPD-28 defines bulk collection as “the authorized collection of large quantities of signals intelligence (SIGINT) data which, due to technical or operational considerations, is acquired without the use of discriminants (e.g., specific identifiers, selection terms, etc.)”3 and implies that collection is targeted if it is not bulk. But PPD-28 defines “discriminant” only by example, so it does not provide a precise definition of either bulk or targeted collection. Nor are these terms defined precisely elsewhere in law or policy. Moreover, the PPD-28 description of bulk collection is problematic because it says that (1) with a broad discriminant, such as “Syria,” collection is targeted, even though it captures a large volume of information and covers vast numbers of people who are not of intelligence value; and (2) if the signal itself contains only the traffic of a single individual, collection is bulk if there is no discriminant. Both of these results are inconsistent with the plain meaning of the words bulk and targeted.
Based in part on briefings from the IC, the committee adopted a definition better suited to understanding the trade-off between civil liberties and effective intelligence: If a significant portion of the data collected is not associated with current targets,4it is bulk collection; otherwise, it is targeted. There is no precise definition of bulk collection, but rather a continuum, with no bright line separating bulk from targeted. The committee acknowledges that use of the word “significant” makes its definition imprecise as well. The IC prefers targeted collection because it narrows its attention as much as possible during collection to use its limited resources efficiently, to comply with rules about what is allowed, and to limit intrusions on privacy.
This report, like PPD-28, focuses on a subset of SIGINT, a broad subset termed “communications or information about communications.”5 This includes electronic communications between people and those between people and services such as Internet search providers, message services, and banks. It also includes “business records” about communications. Intercepting these signals is of concern because it may intrude on the privacy and civil liberties of the communicators. However, this is only one ingredient among many that are used to meet the country’s foreign intelligence needs. Understanding the nature of groups, individuals, organizations, or events that may threaten national security and predicting their behavior requires complex analysis that pieces together many facts from many sources. Studying this whole system was far beyond the scope of this study.
3 Presidential Policy Directive/PPD-28, footnote 5.
4 The term “target” and other key terms used in this report are defined in Section 2.3.
5 Presidential Policy Directive/PPD-28, footnote 3.
The committee paid particular attention to collection of “information about communications,” or metadata,6 a focus of the briefings provided by the IC. NSA has been collecting metadata in bulk for domestic telephone calls since 2006; it has done so under the authority of Section 215 of the Foreign Intelligence Surveillance Act (FISA), enacted as part of the USA Patriot Act in 2001. This study applies not only to this practice but also to a broader set of activities, including the collection of metadata and contents of foreign telephone calls, emails, and other communications. This report addresses the question of alternatives to bulk collection, without regard to the specific authorities and restrictions that control the various types of bulk collection.7
This study, while focused on a technical question and on technological responses, inevitably encounters policy and privacy concerns; policy is bound to be affected by what is technically possible or impossible. Indeed, PPD-28 is itself a policy directive formed partly in response to privacy issues amplified by the Snowden disclosures. The committee did not study these policy questions and tried to avoid making judgments about them.8 The committee tried to answer the technical question in general, rather than only in the context of current policy, because technology and policy can change rapidly.9
The next section provides a brief description of the SIGINT collection model used by the committee.
6 In the case of telephone communications, “metadata” include the calling and called telephone numbers, the time and duration of a call, but not its content. For email, metadata have been interpreted to exclude the subject line. Other types of communications have different metadata elements.
7 For example, FISA and Foreign Intelligence Surveillance Court (FISC) orders restrict bulk collection of domestic telephony records to querying targets with reasonable and articulable suspicion (RAS) that they belong to a foreign terrorist organization. For another example, PPD-28 restricts collection to six specific purposes.
8 For recent reports that deal with policy associated with signals collection, see two reports from the Privacy and Civil Liberties Oversight Board: Report on the Telephone Records Program Conducted under Section 215 of the USA Patriot Act and on the Operations of the Foreign Intelligence Surveillance Court, January 23, 2014, http://www.pclob.gov/library/215-Report_on_the_Telephone_Records_Program.pdf, and Report on the Surveillance Program Operated Pursuant to Section 702 of the Foreign Intelligence Surveillance Act, July 2, 2014, http://www.pclob.gov/library/702-Report.pdf. See also President’s Review Group on Intelligence and Communications Technologies, Liberty and Security in a Changing World, December 12, 2013, http://www.whitehouse.gov/sites/default/files/docs/2013-12-12_rg_final_report.pdf.
9 Indeed, as this study was under way, the President announced he would seek legislation to end bulk collection of domestic telephony metadata (The White House, “The Administration’s Proposal for Ending the Section 215 Bulk Telephony Metadata Program,” Fact Sheet, March 27, 2014, Office of the Press Secretary, Washington, D.C.), and legislation was proposed.
A CONCEPTUAL MODEL OF SIGNALS INTELLIGENCE
In response to intelligence requirements determined by policy makers, NSA takes in signals,10 extracts data about events, filters data according to one or more discriminants, stores the resulting data, analyzes it by querying the store, and disseminates the derived intelligence to other analysts and policy makers (Figure S.1). The first three steps are what the committee calls collection. The “extract” process decodes communications protocols to extract items for further inspection. A discriminant may be chosen to limit the collection to a set of targets determined at the time of collection; this is targeted collection. If a discriminant is chosen to collect a significant quantity of data not relevant to any current target, the collection is bulk. In either case, analysts query the data stored from multiple SIGINT collections and combine them with data from many other sources in order to formulate and disseminate intelligence useful to others. Privacy protections of different sorts are applied at various points throughout the process. These include choices about where to extract signals and what discriminants to use, minimization procedures used to protect information about U.S. persons, and controls on how collected information can be used.
Much of the data in the signal inevitably will not be of interest. This is because modern communication technology aggregates traffic between many sources and destinations onto a single channel—such as the fiber carrying Internet Protocol packets between two routers. With rare exceptions, there is no longer a single physical point, like the central office connection of a landline telephone, at which to observe exactly the items of interest. Thus, this definition of collection says that data is deemed collected only when it is stored for more than a few hours, not when it is extracted.
The distinction between bulk and targeted collection is not precise. When collection is very broad and it is expected that most of the information stored is not relevant to current targets, it is bulk. In contrast, if collection is about a person of interest, it is clearly targeted. There are, however, many cases in between. Throughout the intelligence process, agencies narrow their attention as much as possible, both to comply with rules about what is allowed and to use their limited resources efficiently. Narrowing applies to choosing signals from which to extract data, filtering the extracted data, querying collected data, and disseminating the results. For example, for domestic telephony metadata collected in bulk under the authority of FISA Section 215, a query is allowed only when
10 The sources of the signals are a separate topic that the committee did not consider, although some examples are given later in the report.
FIGURE S.1 A conceptual model for the signals intelligence process.
there is a reasonable and articulable suspicion that the target is associated with a foreign terrorist organization. Often, queries on bulk collections are sufficiently constrained that very little of the collected data is ever examined. Additional rules usually require collected data to be destroyed after a certain time.
CATEGORIES OF USE CASES
Use cases demonstrate how the results of intelligence analysis are used and make the process of intelligence more concrete for outsiders. Use cases that cover the full range of intelligence practice can provide confidence that the consequences of restricting bulk collection are understood and guide a search for alternatives. Although the committee was given unclassified use cases in three categories, it was told that this was not a complete set, so its search for collection alternatives was limited. The use case categories it was given, all of which concern communications between people who are designated by identifiers such as telephone numbers or email addresses, were the following:
• Contact chaining, which traces the network of people associated with a target by following links of the form “A communicated with B” starting at the target and traversing chains of one or more links.
• Alternate identifier techniques that seek to keep current the set of identifiers that a target person is known to be using, when the target is changing identifiers to avoid being tracked.
• Triage starts with a list of identifiers of interest and categorizes the urgency of the threat to national security from the party associated with each one.
A broader set of use cases, such as ones involving collection of communications content, detecting suspicious foreign communications patterns and suspicious queries to Internet search engines, might point to other possibilities for alternatives to bulk collection.
BULK COLLECTION AND INFORMATION ABOUT PAST EVENTS
A common aspect of the categories of use cases above is that they rely in part on information from the past to link or connect identifiers. If past events become interesting in the present—because of new circumstances such as identifying a new target, a nonnuclear nation that is now pursuing the development of nuclear weapons, an individual who is found to be a terrorist, or new intelligence-gathering priorities—then historical events and the data they provide will be available for analysis only if they were previously collected. If it is possible to do targeted collection of similar events in the future, and if they happen soon enough, then the past events might not be needed. If the past events are unique or if delay in obtaining results is unacceptable (because of an imminent threat or perhaps because of press coverage or public demand), then the intelligence will not be as complete. So restricting bulk collection will make intelligence less effective, and technology cannot do anything about this; whether the gain in privacy is worth the loss of information is a policy question that the committee does not address.
Controls on usage can help reduce the conflicts between collection and privacy. There are other entities that collect highly sensitive data and use it for purposes that the people who provide it might not like, such as companies that provide cloud services such as email and social media and “data brokers” that collect and correlate data from a wide variety of public and proprietary sources and sell it to help with decisions about extending credit or for marketing purposes. It is worth comparing how society controls these activities with how it controls the IC. The accepted control paradigm is “notice and consent,” the terms of service that almost no one reads. Although today people are more tolerant of private data collection than of government data collection, this may change as the collection of private data grows. The 2014 report on privacy and big data from the President’s Council of Advisors on Science and Technology proposes
instead that people should have control over how their data are used.11 Controls on use thus offer an alternative to controls on collection as a way of protecting privacy.
There are two ways to control usage: manually and automatically. NSA already has both automated and strong manual controls in place. Despite rigorous auditing and oversight processes, however, it is hard to convince outside parties of their strength, because necessary secrecy prevents the public from observing the controls in action, and because popular descriptions of the controls are imprecise and sometimes wrong. Technical means can isolate collected data and automatically restrict queries that analysts make, and the way these means work can be public without revealing sensitive sources and methods. Then people outside the IC concerned about privacy and civil liberties would have new ways to verify that the IC has adequate procedures and follows them. Enhanced automated controls also offer the promise of reduced burdens on analysts because they can be more efficient than manual controls. Some manual controls would still be necessary to ensure that the automatic controls are actually imposed and that they are configured according to the rules, and to decide cases that are too complex to be automated.
Automated controls and audits require expressing, in software, the rules embodied in laws, policies, regulations, and directives that constrain how intelligence is collected, analyzed, and disseminated. The current rules form a complex network that has grown with changes in technology and in the national security environment. They contain conflicting definitions and inconsistencies. Deriving from the legislative and administrative expressions of the rules, an expression in a concise, consistent, machine-processable form would not only simplify automation software but also make the rules more understandable to the public.
The next section outlines the key technical elements required to control and automate usage.
TECHNICAL ELEMENTS OF AUTOMATED CONTROLS
An automated system for controlling usage of bulk data with high assurance has three parts: isolating bulk data so that it can be accessed only in specific ways, restricting the queries that can be made against it, and auditing the queries that have been done. In each of these areas, there are opportunities for automated control; some of them are already
11 President’s Council of Advisors on Science and Technology (PCAST), Big Data and Privacy: A Technological Perspective, Executive Office of the President, May 2014, http://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_big_data_and_privacy_-_may_2014.pdf.
deployed in the IC or in private companies, some have been demonstrated in research laboratories, and some are promising research directions.
Isolating bulk data is one technical method for controlling usage. Figure S.2 shows the elements of this method. Bulk data are cut off from the outside world by an isolation boundary. The only way to cross this boundary is to submit a query to the guard, which enforces the policy that controls what queries and results are allowed. The guard logs all queries and results for later auditing, and the audit log itself is isolated to protect it from tampering. The isolated domain is hosted by a mechanism that guarantees the isolation. The guard, the isolation boundary, and bulk data processing are the critical parts of this system. The simpler and clearer their tasks are, and the shorter and clearer the software programs that implement them, the more likely they are to be trustworthy.
Restricting queries automatically in the guard is another aspect of controlling usage automatically. The goal is to do this well enough that software can decide which queries are allowed by the policy, or at least drastically reduce the number of queries that require manual, human approval. This is certainly feasible for limited classes of queries such as, “Find all the phone numbers that have connected in the last month to this list of numbers belonging to a known target.” Indeed, NSA already has pre-approved queries.
Auditing usage of bulk data is essential to enforce privacy protections. Isolation provides confidence that every query is permanently logged and that the log cannot be altered. Then the log must be reviewed for compliance with the rules. Doing this manually is feasible and is, indeed, NSA’s current practice. Although it is thorough, it is expensive and not
FIGURE S.2 Controlling usage by isolating bulk data.
transparent—outsiders must rely on the agency’s assurance that it is being done properly because the queries are usually highly classified. Automation of auditing, a direction NSA is pursuing, could not only streamline audits but also provide assurance to outside inspectors, who can examine the auditing technology. Automation of auditing is an area that has been neglected by government, industry, and academia.
Automated controls and auditing of SIGINT data held and accessed securely may allow sufficiently thorough unclassified inspection of the privacy-protecting mechanisms of the SIGINT process to allay privacy and civil liberty concerns. The inspection would focus on the automation software and the usage rules it enforces rather than on the data, which must remain classified.
Although no software can fully replace bulk with targeted information collection, software can be developed to more effectively target collection and to control the usage of collected data.
Conclusion 1. There is no software technique that will fully substitute for bulk collection where it is relied on to answer queries about the past after new targets become known.
A key value of bulk collection is its record of past SIGINT that may be relevant to subsequent investigations. If past events become interesting in the present, because intelligence-gathering priorities change to include detection of new kinds of threats or because of new events such as the discovery that an individual is a terrorist, historical events and the context they provide will be available for analysis only if they were previously collected.
The committee was not asked to and did not consider whether the loss of effectiveness from reducing bulk collection would be too great or whether the potential gain in privacy from adopting an alternative is worth the potential loss of intelligence information. Nor was it able to identify broad categories of use where substitution of alternatives might be possible or to detect broadly useful metrics that would inform such decisions. ODNI may wish to study these questions further.
Other groups, such as the President’s Review Group on Intelligence and Communications Technologies and the Privacy and Civil Liberties Oversight Board (in its Section 215 report), have said that bulk collection of telephone metadata is not justified.12 These were policy and legal judgments that are not in conflict with the committee’s conclusion that there is
12 See footnote 8.
no software technique that will fully substitute for bulk collection; there is no technological magic.
Conclusion 1.1. Other sources of information might provide a partial substitute for bulk collection in some circumstances.
Data retained from targeted SIGINT collection is a partial substitute if the needed information was in fact collected. Bulk data held by other parties, such as communications service providers, might substitute to some extent, but this relies on those parties retaining the information until it is needed, as well as the ability of intelligence agencies to collect or access it in an efficient and timely fashion. Other intelligence sources and methods might also be able to supply some of the lost information, but the committee was not charged to and did not investigate the full range of such alternatives. Note that these alternatives may introduce their own privacy and civil liberties concerns.
Conclusion 1.2. New approaches to targeting might improve the relevance of the collected information to future use and would rely on capabilities such as creating and using profiles of potentially relevant targets, possibly by using other sources of information.
Because bulk collection cannot for practical reasons be truly comprehensive, it is itself inherently selective and unable to capture all relevant history.13 It may be possible to improve targeted collection to the point where it provides a viable substitute for bulk collection in at least some cases, using profiles of potential targets that are compiled from a wide range of information. This might reduce collection against persons who are not targets, but it might also introduce new privacy and civil liberties concerns about how such profiles are developed and used.
Rapidly updating discriminants of ongoing collections to include new targets as they are discovered will collect data that would otherwise be lost. If targeted collection can be done quickly and well enough, bulk information about past events may not be needed. Targeted collection cannot be a substitute if the past events were unique or if the delay incurred to collect new information would be unacceptable.
Conclusion 2. Automatic controls on the usage of data collected in bulk can help to enforce privacy protections.
13 The FISA Section 215 program collects “only a small percentage of the total telephony metadata held by service providers” (President’s Review Group on Intelligence and Communications Technologies, Liberty and Security in a Changing World, 2013, p. 97).
Automation of usage controls may simultaneously allow a more nuanced set of usage rules, facilitate compliance auditing, and reduce the burden of controls on analysts. Similarly, there are opportunities to automate the various audit mechanisms to verify that rules are followed. Such capabilities could be enhanced as the information technology systems for collection and analysis are refreshed and modernized. These techniques may permit more of the use controls and audit mechanisms to be explained clearly to the public. It may be possible to express a large fraction of the rules required by law and policy in a machine-processable form and thus apply them rapidly and consistently during collection, analysis, and dissemination.
Conclusion 2.1. It will be easier to automate controls if the rules governing collection and use are technology-neutral (i.e., not tied to specific, rapidly changing information and communications technologies or historical artifacts of particular technologies) and if they are based on a consistent set of definitions.14
Conclusion 2.2. Automated controls can provide new opportunities to make the controls more transparent by giving the public and oversight bodies the opportunity to inspect the software artifacts that describe and implement the controls. Increased transparency can give people outside the IC more confidence that the controls are appropriate, although the need for secrecy about some of the details makes complete confidence unlikely.
Conclusion 3. Research and development can help in developing software intended to (1) enhance the effectiveness of targeted collection and (2) improve automated usage controls.15
Conclusion 3.1 The use of targeted collection can be improved by enriching and streamlining methods for determining and deploying new targets rapidly, using automated processing and/or streamlined approval procedures.16
14 This conclusion is consistent with Recommendation 2 in PCAST, Big Data and Privacy: A Technological Perspective, 2014.
15 See also Ibid., Recommendation 3.
16 Examples of manual procedures for target approval are in National Security Agency, NSA’s Civil Liberties and Privacy Protections for Targeted SIGINT Activities Under Executive Order 12333, NSA Director of Civil Liberties and Privacy Office Report, October 7, 2014, https://www.nsa.gov/civil_liberties/_files/nsa_clpo_report_targeted_EO12333.pdf.
Analytics, such as “big data analytics,” may help narrow collection, even if they are not sufficiently precise to identify individual targets. If the government is constrained by privacy concerns to collect less data, it may nevertheless be able to use the power of large private-sector databases, analytics, and machine learning to shape the constraints to collect only data predicted to have high value. New uses by the government of private-sector databases would also raise new privacy and civil liberties questions.
Advanced targeting methods may require a great deal of computing, so that filters should be cascaded to first apply cheap tests, followed by more expensive filters only if earlier filters warrant. For example, if metadata indicate a civilian telephone call to a military unit under surveillance, speech recognition and subsequent semantic analysis might be applied to the voice signal, resulting in an ultimate collection decision. Richer targeting may require enhancing the ability of collection hardware and software to apply complex discriminants to real-time signals feeds.
Conclusion 3.2. More powerful automation could improve the precision, robustness, efficiency, and transparency of the controls, while also reducing the burden of controls on analysts.
Some of the necessary technologies exist today, although they may need further development for use in intelligence applications; others will require research and development work. This approach and others for privacy protection of data held by the private sector can be exploited by the IC.17 Research could also advance the ability to systematically encode laws, regulations, and policies in a machine-processable form that would directly configure the rule automation.
It does not necessarily follow from Conclusion 1 that current bulk collection must continue. What it does mean is that curtailing bulk collection would deprive analysts of some information. Reduction in bulk collection may be partially mitigated by improvements in targeting, a direction for future research outlined above. If the IC continues to collect SIGINT in bulk, the technology described in this report can reduce risk and improve oversight and transparency and, thus, perhaps mitigate public concerns about it.
17 PCAST, Big Data and Privacy: A Technological Perspective, 2014. Recommendation 1, Sections 4 and 4.5.2.