J
The Total/Terrorist Information Awareness Program

J.1
A BRIEF HISTORY1

In 2002, in the wake of the September 11, 2001, attacks, the Defense Advanced Research Projects Agency (DARPA) of the U.S. Department of Defense (DOD) launched a research and development effort known as the Total Information Awareness (TIA) program. Later renamed the Terrorism Information Awareness program, TIA was a research and development program intended to counter terrorism through prevention by developing and integrating information analysis, collaboration, and decision-support tools with language-translation, data-searching, pattern-recognition, and privacy-protection technologies.2 The program included the development of a prototype system/network to provide an environment for integrating technologies developed in the program and as a testbed for conducting experiments. Five threads for research investigation were to be pursued: secure collaborative problem-solving among disparate agencies and institutions, structured information-searching and pattern recognition based

1

This description of the TIA program is based on unclassified, public sources that are presumed to be authoritative because of their origin (for example, Department of Defense documents and speeches by senior program officials). Recognizing that some aspects of the program were protected by classification, the committee believes that this description is accurate but possibly incomplete.

2

Defense Advanced Research Programs Agency (DARPA), “Report to Congress Regarding the Terrorism Information Awareness Program: In response to Consolidated Appropriations Resolution, 2003, Pub. L. No. 108-7, Division M, § 111(b),” DARPA, Arlington, Va., May 20, 2003.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 239
J The Total/Terrorist Information Awareness Program J.1 A BRIEF HISTORY1 In 2002, in the wake of the September 11, 2001, attacks, the Defense Advanced Research Projects Agency (DARPA) of the U.S. Department of Defense (DOD) launched a research and development effort known as the Total Information Awareness (TIA) program. Later renamed the Terrorism Information Awareness program, TIA was a research and development program intended to counter terrorism through prevention by developing and integrating information analysis, collaboration, and decision-support tools with language-translation, data-searching, pattern-recognition, and privacy-protection technologies.2 The program included the development of a prototype system/network to provide an environment for integrating technologies developed in the program and as a testbed for conducting experiments. Five threads for research investigation were to be pursued: secure collaborative problem-solving among disparate agencies and insti- tutions, structured information-searching and pattern recognition based 1 This description of the TIA program is based on unclassified, public sources that are presumed to be authoritative because of their origin (for example, Department of Defense documents and speeches by senior program officials). Recognizing that some aspects of the program were protected by classification, the committee believes that this description is accurate but possibly incomplete. 2 Defense Advanced Research Programs Agency (DARPA), “Report to Congress Regarding the Terrorism Information Awareness Program: In response to Consolidated Appropriations Resolution, 2003, Pub. L. No. 108-7, Division M, § 111(b),” DARPA, Arlington, Va., May 20, 2003. 

OCR for page 239
0 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS on information from a wide array of data sources, social-network analysis tools to understand linkages and organizational structures, data-sharing in support of decision-making, and language-translation and informa- tion-visualization tools. A technical description of the system stressed the importance of using real data and real operational settings that were complex and huge.3 The TIA program sought to pursue important research questions, such as how data mining techniques might be used in national-security investigations and how technological approaches might be able to amelio- rate the privacy impact of such analysis. For example, in a speech given in August 2002, John Poindexter said that4 IAO [Information Awareness Office] programs are focused on making Total Information Awareness—TIA—real. This is a high level, visionary, functional view of the world-wide system—somewhat over simplified. One of the significant new data sources that needs to be mined to dis- cover and track terrorists is the transaction space. If terrorist organiza- tions are going to plan and execute attacks against the United States, their people must engage in transactions and they will leave signatures in this information space. This is a list of transaction categories, and it is meant to be inclusive. Currently, terrorists are able to move freely throughout the world, to hide when necessary, to find sponsorship and support, and to operate in small, independent cells, and to strike in- frequently, exploiting weapons of mass effects and media response to influence governments. We are painfully aware of some of the tactics that they employ. This low-intensity/low-density form of warfare has an information signature. We must be able to pick this signal out of the noise. Certain agencies and apologists talk about connecting the dots, but one of the problems is to know which dots to connect. The relevant information extracted from this data must be made available in large- scale repositories with enhanced semantic content for easy analysis to accomplish this task. The transactional data will supplement our more conventional intelligence collection. Nevertheless, authoritative information about the threats of interest to the TIA program is scarce. In some accounts, TIA was focused on a generalized terrorist threat. In other informed accounts, TIA was pre- mised on the notion of protecting a small number of high-value targets in the United States, and a program of selective hardening of those targets 3 Defense Advanced Research Programs Agency (DARPA), Total Information Awareness Program System Description Document, version 1.1, DARPA, Arlington, Va., July 19, 2002. 4 J. Poindexter, Overview of the Information Awareness Office, Remarks prepared for DARPATech 2002 Conference, Anaheim, Calif., August 2, 2002, available at http://www. fas.org/irp/agency/dod/poindexter.html.

OCR for page 239
 APPENDIX J would force terrorists to carry out attacks along particular lines, thus lim - iting the threats of interest and concern to TIA technology. The TIA program was cast broadly as one that would “integrate advanced collaborative and decision support tools; language translation; and data search, pattern recognition, and privacy protection technologies into an experimental prototype network focused on combating terrorism through better analysis and decision making.”5 Regarding data-searching and pattern recognition, research was premised on the idea that . . . terrorist planning activities or a likely terrorist attack could be uncov- ered by searching for indications of terrorist activities in vast quantities of transaction data. Terrorists must engage in certain transactions to co- ordinate and conduct attacks against Americans, and these transactions form patterns that may be detectable. Initial thoughts are to connect these transactions (e.g., applications for passports, visas, work permits, and drivers’ licenses; automotive rentals; and purchases of airline ticket and chemicals) with events, such as arrests or suspicious activities. 6 As described in the DOD TIA report, “These transactions would form a pattern that may be discernable in certain databases to which the U.S Government would have lawful access. Specific patterns would be identi- fied that are related to potential terrorist planning.”7 Furthermore, the program would focus on analyzing nontargeted transaction and event data en masse rather than on collecting information on specific individuals and trying to understand what they were doing. The intent of the program was to develop technology that could discern event and transaction patterns of interest and then identify individuals of interest on the basis of the events and transactions in which they partici- pated. Once such individuals were identified, they could be investigated or surveilled in accordance with normal and ordinary law-enforcement and counterterrorism procedures. The driving example that motivated TIA was the set of activities of the 9/11 terrorists who attacked the World Trade Center. In retrospect, it was discovered that they had taken actions that together could be seen 5 Defense Advanced Research Programs Agency (DARPA), “Report to Congress Regarding the Terrorism Information Awareness Program: In response to Consolidated Appropriations Resolution, 2003, Pub. L. No. 108-7, Division M, § 111(b),” DARPA, Arlington, Va., May 20, 2003. 6 DARPA. Defense Adanced Research Projects Agency’s Information Awareness Office and Ter - rorism Information Awareness Project. Available at http://www.taipale.org/references/iaotia. pdf. 7 Defense Advanced Research Programs Agency (DARPA), “Report to Congress Regarding the Terrorism Information Awareness Program: In response to Consolidated Appropriations Resolution, 2003, Pub. L. No. 108-7, Division M, § 111(b),” May 20, 2003, p. 14.

OCR for page 239
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS as predictors of the attack even if no single action was unlawful. Among those actions were flight training (with an interest in level flight but not in takeoff and landing), the late purchase of one-way air tickets with cash, foreign deposits into banking accounts, and telephone records that could be seen to have connected the terrorists. If the actions could have been correlated before the fact, presumably in some automated fashion, suspicions might have been aroused in time to foil the incident before it happened. Because the TIA program was focused on transaction and event data that were already being collected and resident in various databases, pri- vacy implications generally associated with the collection of data per se did not arise. But the databases were generally privately held, and many privacy questions arose because the government would need access to the data that they contained. The databases also might have contained the digital signatures of most Americans as they conducted their everyday lives, and this gave rise to many concerns about their vast scope. After a short period of intense public controversy, Congress took action on the TIA program in 2003. Section 8131 of H.R. 2658, the Depart- ment of Defense Appropriations Act of 2004, specified that (a) Notwithstanding any other provision of law, none of the funds ap- propriated or otherwise made available in this or any other Act may be obligated for the Terrorism Information Awareness Program: Provided, That this limitation shall not apply to the program hereby authorized for processing, analysis, and collaboration tools for counterterrorism foreign intelligence, as described in the Classified Annex accompanying the Department of Defense Appropriations Act, 2004, for which funds are expressly provided in the National Foreign Intelligence Program for counterterrorism foreign intelligence purposes. (b) None of the funds provided for processing, analysis, and collabora- tion tools for counterterrorism foreign intelligence shall be available for deployment or implementation except for: (1) lawful military operations of the United States conducted outside the United States; or (2) lawful foreign intelligence activities conducted wholly overseas, or wholly against non-United States citizens. (c) In this section, the term “Terrorism Information Awareness Program” means the program known either as Terrorism Information Awareness or Total Information Awareness, or any successor program, funded by the Defense Advanced Research Projects Agency, or any other Department or element of the Federal Government, including the individual compo- nents of such Program developed by the Defense Advanced Research Projects Agency.

OCR for page 239
 APPENDIX J It is safe to say that the issues raised by the TIA program have not been resolved in any fundamental sense. Though the program itself was terminated, much of the research under it was moved from DARPA to another group, which builds technologies primarily for the National Secu- rity Agency, according to documents obtained by the National Journal and to intelligence sources familiar with the move. The names of key projects were changed, apparently to conceal their identities, but their funding remained intact, often under the same contracts.8 The immediate result, therefore, of congressional intervention was to drive the development and deployment of data mining at DOD from pub- lic view, relieve it of the statutory restrictions that had previously applied to it, block funding for research into privacy-enhancing technologies, and attenuate the policy debate over the appropriate roles and limits of data mining. Law and technology scholar K.A. Taipale wrote:9 At first hailed as a “victory” for civil liberties, it has become increasingly apparent that the defunding [of TIA] is likely to be a pyrrhic victory. . . . Not proceeding with a focused government research and develop- ment project (in which Congressional oversight and a public debate could determine appropriate rules and procedures for use of these tech- nologies and, importantly, ensure the development of privacy protecting technical features to support such policies) is likely to result in little secu- rity and, ultimately, brittle privacy protection. . . . Indeed, following the demise of IAO and TIA, it has become clear that similar data aggregation and automated analysis projects exist throughout various agencies and departments not subject to easy review. Thus, many other data mining activities supported today by the U.S. government continue to raise the same issues as did the TIA program: the potential utility of large-scale databases containing personal information for counterterrorism and law-enforcement purposes and the potential privacy impact of the use of such databases by law-enforcement and national-security authorities. J.2 A TECHNICAL PERSPECTIVE ON TIA’S APPROACH TO PROTECTING PRIVACY As noted above, managers of the TIA program understood that their approach to identifying terrorists before they acted had major privacy implications. To address privacy issues in TIA and similar programs, such 8 S. Harris, “TIA lives on,” National Journal, February 23, 2006, available at http:// nationaljournal.com/about/njweekly/stories/2006/0223nj1.htm#. 9 K.A. Taipale, “Data mining and domestic security: Connecting the dots to make sense of data,” Columbia Science and Technology Law Reiew 5(2):1-83, 2003.

OCR for page 239
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS as MATRIX, Tygar10 and others have advocated the use of what has come to be called selected revelation, involving something like the risk-util- ity tradeoff in statistical disclosure limitation. Sweeney11 used the term to describe an approach to disclosure limitation that allows data to be shared for surveillance purposes “with a sliding scale of identifiability, where the level of anonymity matches scientific and evidentiary need.” That corresponds to a monotonically increasing threshold for maximum tolerable risk in the risk-utility confidentiality-map framework previously described in Duncan et al.12 Some related ideas emanate from the com- puter-science literature, but most authors attempt to demand a stringent level of privacy, carefully defined, and to restrict access by adding noise and limitations on the numbers of queries allowed (e.g., see Chawla et al.13). The TIA privacy report suggests that14 selective revelation [involves] putting a security barrier between the private data and the analyst, and controlling what information can flow across that barrier to the analyst. The analyst injects a query that uses the private data to determine a result, which is a high-level sanitized description of the query result. That result must not leak any private information to the analyst. Selective revelation must accommodate mul- tiple data sources, all of which lie behind the (conceptual) security bar- 10 J.D. Tygar, “Privacy Architectures,” presentation at Microsoft Research, June 18, 2003, available at http://research.microsoft.com/projects/SWSecInstitute/slides/Tygar.pdf; J.D. Tygar, “Privacy in sensor webs and distributed information systems,” pp. 84-95 in Software Security Theories and Systems, M. Okada, B. Pierce, A. Scedrov, H. Tokuda, and A. Yonezawa, eds., Springer, New York, 2003. 11 L. Sweeney, “Privacy-preserving surveillance using selective revelation,” LIDAP Work- ing Paper 15, Carnegie Mellon University, 2005; updated journal version is J. Yen, R. Popp, G. Cybenko, K.A. Taipale, L. Sweeney, and P. Rosenzweig, “Homeland security,” IEEE Intel- ligent Systems 20(5):76-86, 2005. 12 G.T. Duncan, S.E. Fienberg, R. Krishnan, R. Padman, and S.F. Roehrig, “Disclosure limitation methods and information loss for tabular data,” pp. 135-166 in Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J. Lane, J. Theeuwes, and L. Zayatz, eds., North-Holland, Amsterdam, 2001. See also G.T. Duncan, S.A. Keller-McNulty, and S.L. Stokes, Database Security and Confidentiality: Examin- ing Disclosure Risk s. Data Utility Through the R–U Confidentiality Map, Technical Report 142, National Institute of Statistical Sciences, Research Triangle Park, N.C., 2004; G.T. Duncan and S.L. Stokes, “Disclosure risk vs. data utility: The R–U confidentiality map as applied to topcoding,” Chance 17(3):16-20, 2004. 13 S.C. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee, “Towards Privacy in Public Datatbases,” in Theory of Cryptography Conference Proceedings, J. Kilian, ed., Lecture Notes in Computer Science, Volume 3378, Springer-Verlag, Berlin, Germany. 14 Information Systems Advanced Technology (ISAT) panel, Security with Priacy, DARPA, Arlington, Va., 2002, p. 10, available at http://www.cs.berkeley.edu/~tygar/papers/ISAT- final-briefing.pdf.

OCR for page 239
 APPENDIX J rier. Private information is not made available directly to the analyst, but only through the security barrier. One effort to implement this scheme was dubbed privacy appliances by Golle et al. and was intended to be a stand-alone device that would sit between the analyst and the private data source so that private data stayed in authorized hands.15 The privacy controls would also be inde- pendently operated to keep them isolated from the government. Accord- ing to Golle et al., the device would provide: • Inference control to prevent unauthorized individuals from complet- ing queries that would allow identification of ordinary citizens. • Access control to return sensitive identifying data only to authorized users. • Immutable audit trails for accountability. Implicit in the TIA report and in the Golle et al. approach was the notion that linkages between databases behind the security barrier would use identifiable records and thus some form of multiparty computation method involving encryption techniques. The real questions of interest in “inference control” are, What disclo- sure-limitation methods should be used? To which databases should they be applied? How can the “inference control” approaches be combined with the multiparty computation methods? Here is what is known in the way of answers: • Both Sweeney and Golle et al. refer to microaggregation, known as k-anonymity, but with few details on how it could be used in this con- text. The method combines observations in groups of size k and reports either the sum or the average of the group for each unit. The groups may be identified by clustering or some other statistical approach. Left unsaid is what kinds of users might perform with such aggregated data. Furthermore, neither k-anonymity nor any other confidentiality tool does anything to cope with the implications of the release of exactly linked files requested by “authorized users.” • Much of the statistical and operations-research literature on con- fidentiality fails to address the risk-utility trade-off, largely because it 15 Philippe Golle et al. “Protecting Privacy in Terrorist Tracking Applications,” presentation to Computers, Freedom, and Privacy 2004, available at http://www.cfp2004.org/program/ materials/w-golle.ppt.

OCR for page 239
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS focuses primarily on privacy or on technical implementations without understanding how users wish to analyze a database.16 • A clear lesson from the statistical disclosure-limitation literature is that privacy protection in the form of “safe releases” from separate databases does not guarantee privacy protection for a merged database. A figure in Lunt et al.17 demonstrates recognition of that by showing privacy appliances applied for the individual databases and then independently for the combined data. • There have been a small number of crosswalks between the sta- tistical disclosure-limitation literature on multiparty computation and risk-utility trade-off choices for disclosure limitation. Yang et al. provide a starting point for discussions on k-anonymity.18 There are clearly a number of alternatives to k-anonymity and alternatives that yield “ano- nymized” databases of far greater statistical utility. • The “hype” associated with the TIA approach to protection has abated, largely because TIA no longer exists as an official program. But similar programs continue to appear in different places in the federal gov- ernment and no one associated with any of them has publicly addressed the privacy concerns raised here regarding the TIA approach. When Congress stopped the funding for DARPA's TIA program in 2003, work on the privacy appliance's research and development effort at PARC Research Center was an attendant casualty. Thus, prototypes of the privacy appliance have not been made publicly available since then, nor are they likely to appear in the near future. The claims of privacy protection and selective revelation continued with MATRIX and other data warehouse systems but without an attendant research program, and the federal government continues to plan for the use of data mining tech- niques in other initiatives, such as the Computer Assisted Passenger Pro- 16 R. Gopal, R. Garfinkel, and P. Goes, “Confidentiality via camouflage: The CVC approach to disclosure limitation when answering queries to databases,” Operations Research 50:501- 516, 2002. 17 T. Lunt, J. Staddon, D. Balfanz, G. Durfee, T. Uribe, D. Smetters, J. Thornton, P. Aoki, B. Waters, and D. Woodruff, “Protecting Privacy in Terrorist Tracking Applications,” presen- tation at the University of Washington/Microsoft Research/Carnegie Mellon University Software Security Summer Institute, Software Security: How Should We Make Software Secure? on June 15-19, 2003, available at http://research.microsoft.com/projects/SWSecInstitute/ five-minute/Balfanz5.ppt. 18 Z. Yang, S. Zhong, and R.N. Wright, “Anonymity-preserving data collection,” pp. 334- 343 in Proceedings of the th ACM SIGKDD International Conference on Knowledge Discoery and Data MiningKDD’0, Association for Computing Machinery, New York, N.Y., 2005.

OCR for page 239
 APPENDIX J filing System II (CAPPS II). Similar issues arise in the use of government, medical, and private transaction data in bioterrorism surveillance. 19 J.3 ASSESSMENT Section J.1 provided a brief history of the TIA program. Whatever one’s views regarding the desirability or technical feasibility of the TIA program, it is clear that from a political standpoint, the program was a debacle. Indeed, after heated debate, the Senate and House appropria- tions committees decided to terminate funding of the program. 20 On pas- sage of the initial funding limitation, a leading critic of the TIA program, Senator Ron Wyden, declared: The Senate has now said that this program will not be allowed to grow without tough Congressional oversight and accountability, and that there will be checks on the government’s ability to snoop on law-abiding Americans.21 The irony of the TIA debate is that although the funding for the TIA program was indeed terminated, both research on and deployment of data mining systems continue at various agencies (Appendix I, “Illustra- tive Government Data Mining Programs and Activity”), but research on privacy-management technology did not continue, and congressional oversight of data mining technology development has waned to some degree. The various outcomes of the TIA debate raise the question of whether the nature of the debate over the program (if not the outcome) could have been any different if policy makers had addressed in advance some of the difficult questions that the program raised. In particular, it is interesting to consider questions in the three categories articulated in the framework of Chapter 2: effectiveness, consistency with U.S. laws and values, and possible development of new laws and practices. The TIA example further illustrates how careful consideration of the privacy impact of new tech- nologies is needed before a program seriously begins the research stage. The threshold consideration of any privacy-sensitive technology is whether it is effective in meeting a clearly defined law-enforcement or 19 See S.E. Fienberg and G. Shmueli, “Statistical issues and challenges associated with rapid detection of bio-terrorist attacks,” Statistics in Medicine 24:513-529, 2005; L. Sweeney, “Privacy-Preserving Bio-Terrorism Surveillance,” presentation at AAAI Spring Symposium, AI Technologies for Homeland Security, Stanford University, Stanford, Calif., 2005. 20 U.S. House, Conference Report on H.R. 2658, Department of Defense Appropriations Act, (House Report 108-283), U.S. Government Printing Office, Washington, D.C., 2004. 21 Declan McCullagh, “Senate limits Pentagon ‘snooping’ plan,” CNET News.com, January 24, 2003. Available at http://sonyvaio-cnet.com.com/2100-1023_3-981945.html.

OCR for page 239
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS national-security purpose. The question of effectiveness must be assessed through rigorous testing guided by scientific standards. The TIA research program proposed an evaluation framework, but none of the results of evaluation have been made public. Some testing and evaluation may have occurred in a classified setting, but neither this committee nor the public has any knowledge of results. Research on how large-scale data-analysis techniques, including data mining, could help the intelligence community to identify potential terrorists is certainly a reasonable endeavor. Assum- ing that initial research justifies additional effort on the basis of scientific standards of success, the work should continue, but it must be accompa- nied by a clear method for assessing the reliability of the results. Even if a proposed technology is effective, it must also be consistent with existing U.S. law and democratic values. First, one must assess whether the new technique and objective comply with law. In the case of TIA, DARPA presented to Congress a long list of laws that it would com- ply with and affirmed that “any deployment of TIA’s search tools may occur only to the extent that such a deployment is consistent with current law.” Second, inasmuch as TIA research sought to enable the deployment of very large-scale data mining over a larger universe of data than the U.S. government had previously analyzed, even compliance with then-current law would not establish consistency with democratic values. The surveillance power that TIA proposed to put in the hands of U.S. investigators raised considerable concern among policy makers and the general public. That the program, if implemented, could be said to com- ply with law did not address those concerns. In fact, the program raised the concerns to a higher level and ultimately led to an effort by Congress to stop the research altogether. TIA-style data mining was, and still is, possible because there are few restrictions on government access to third-party business records. Any individual business record (such as a travel reservation or credit-card transactions) may have relatively low privacy sensitivity when looked at in isolation; but when a large number of such transaction records are analyzed over time, a complete and intrusive picture of a person’s life can emerge. Developing the technology to derive such individual profiles was precisely the objective of the TIA program. It proposed to use such pro- files in only the limited circumstances in which they indicated terrorist activity. That may be a legitimate goal and could ultimately be recognized explicitly as such by law. However, that the program was at once legal and at the same time appeared to cross boundaries not previously crossed by law-enforcement or national-security investigations gives rise to ques- tions that must be answered.

OCR for page 239
 APPENDIX J John Poindexter, director of the DARPA office responsible for TIA, was aware of the policy questions and took notable steps to include in the technical research agenda various initiatives to build technical mechanisms that might minimize the privacy impact of the data mining capabilities being developed. In hindsight, however, a more comprehen- sive analysis of both the technical and larger public-policy considerations associated with the program was necessary to address Congress’s con- cerns about privacy impact.