Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 120
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment C Information and Information Technology C.1 THE INFORMATION LIFE CYCLE As Chapter 1 points out, digital information in use typically goes through a seven-step life cycle. These steps include collection, correction and cleaning, storage, use or analysis, publication or sharing, monitoring and evaluation, and retention or deletion. C.1.1 Information Collection The information collected for a program must be appropriate to its purpose. Data minimization requires that only information critical to that purpose be collected, though minimization often conflicts with the temptation to gather more information “just in case” it might be useful later in easing the relevant analytical tasks or even for other possibly relevant purposes. Legislation, regulation, or other governance rules may require that internal or external authorization to collect the information be obtained, including from relevant third parties. The information source(s) and the information itself must be verified as reliable, objective, and compliant with relevant laws. The government collects information for counterterrorism from many other sources, primarily as extracts from information systems. The government mandates or requests information from many industries: Customs and Border Protection obtains manifests for trucks entering the United States from trucking firms; the Department of Homeland Security (DHS), including the Transportation Security Administration, and
OCR for page 121
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment the National Aeronautics and Space Administration obtain passenger names and records from airlines; the Justice Department obtains Web search terms, URLs, and other records from the information technology (IT) and telecommunications industries; the National Security Agency obtains phone call records from communications providers; and the Treasury Department obtains suspicious activity reports from the financial community. In addition, employers, retailers, banks, and travel and telecommunications companies collect data directly from customers as well as from many other government and private sources. The largest databases in the world are click-streams collected from Web interactions, second only to retail and scientific databases. For example, it is conventional practice for companies to collect extensive information on prospective employees from financial and educational institutions, law enforcement, former employers, and so forth. Information collection is a significant and growing sector of the information economy. Finally, the government obtains a great deal of data from private data brokers, who aggregate data on individuals from all legally available sources. Because the data are collected by private parties, much of the data are not subject to existing restrictions on government collection efforts. C.1.2 Information Correction and Cleaning A significant practical and research challenge is to ensure that the information is correct, accurate, and reliable. This is aided by ensuring reliable information provenance and the use of automated and human data validation techniques. For example, automated techniques could be used easily to recognize as anomalous an indicator of pregnancy in the medical records of a male. Moreover, in certain instances, laws govern the rights of an individual to correct information errors in commercial applications, for example in one’s credit report. If the individual finds what he or she believes to be an error, documentation of that error can be provided and the error corrected. If the party providing the data does not agree that it made an error, the individual has the right to insert into the record a statement of limited length providing his side of the story. To the best of the committee’s knowledge, individuals negatively affected by counterterrorism programs as the result of data errors have no comparable ability. Indeed, for national security reasons, individuals are not permitted to review the data on which adverse decisions are based, even though they may experience the negative consequences (e.g., by being denied boarding a plane).
OCR for page 122
OCR for page 123
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment efforts will be more effective when the relevant agencies can easily and effectively cooperate and share information.1 The National Counter-terrorism Center (NCTC) was established to serve as a multiagency center analyzing and integrating all intelligence pertaining to terrorism, including threats to U.S. interests at home and abroad. NCTC also is responsible for developing, implementing, and assessing the effectiveness of strategic operational planning efforts to achieve counterterrorism objectives. Compared to the relevant policy and practices, the technology for sharing information is relatively well developed. Today, modern information systems live in an ecosystem of other information systems and services, accessible enterprise-wide over an intranet or worldwide over the Internet, and it is increasingly common for both raw information and analytical results to be published electronically. A modern information system obtains information and services from many other information systems, in some cases thousands of information systems, and reciprocally provides information and services. Such ecosystems developed originally to increase automation by eliminating paper or electronic reports that were exchanged with humans or other systems by largely human means. Currently such ecosystems permit organizations to modify and enhance their businesses with great speed and agility. Customers have the convenience of reserving a trip with a travel agent and having all of the relevant hotels, car rental agencies, airlines, credit card companies, and banks handled transparently. While information systems’ interoperation and information sharing are a convenience for a customer, they are a business-critical requirement in almost every business. Clear civil liberties concerns arise when information is shared and repurposed without restriction. Hence, the committee’s framework lists the criteria and best practices that are required to protect civil liberties, including appropriateness, agency and external authorization, defined purpose, and assessment, as discussed below. C.1.6 Information Monitoring An information program must be continuously monitored and assessed to ensure that it is effective in achieving its purpose and that 1 See for example, National Security Council, National Strategy for Combating Terrorism, National Security Council, Washington, D.C., September 2006, available at http://www.whitehouse.gov/nsc/nsct/2006/; National Commission on Terrorist Attacks upon the United States, 9/11 Commission Report, U.S. Government Printing Office, Washington, D.C., July 2004; and three reports of the Markle Foundation Task Force on National Security in the Information Age, Markle Foundation, New York, N.Y., available at http://www.markletaskforce.org/: Protecting America’s Freedom in the Information Age (2002), Creating a Trusted Network for Homeland Security (2003), and Mobilizing Information to Prevent Terrorism: Accelerating Development of a Trusted Information Sharing Environment (2006).
OCR for page 124
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment it complies with all relevant laws, regulations, and governance. The committee’s framework lists several relevant criteria for which there are best practices, including audit trails, auditing for compliance with existing laws, ensuring reporting and redress of false positives and related impacts on individuals, and having in place a privacy officer, training, agency authorization, and external authorization. One of the most challenging aspects of information-intensive systems is evaluating their efficacy or their effectiveness relative to their purpose. The growth in data, transactions, and analytical volumes is a direct measure of the value and the efficacy of data and information processing. The continued growing investment in these programs is a direct measure of their effectiveness in promoting economic competitiveness in the marketplace.2 More specifically, each industry and application domain, such as telecommunications billing, has well-defined measures of efficacy or business effectiveness. For example, two of the many telecommunications billing metrics include time and cost to produce. An extreme example involves Wall Street arbitrageurs who search the entire history of stock market trades and simultaneous trades as they occur in all U.S. trading floors and find, on a regular basis, investment opportunities in 100ths of seconds. Typically there are best practices and defined standards for assessing effectiveness, as called for in the committee’s framework. Following information system best practices, counterterrorism programs should have efficacy metrics defined for them against which they can be assessed. C.1.7 Information Retention The final step of the information life cycle involves the retention or deletion of information based on a defined retention period, data quality, data minimization, or other criteria.3 Data retention refers to the period of time during which an organization can or must retain data in its automated and manual records. A data retention requirement may be that data 2 In 2005, the information technology products sector accounted for $640 billion or 2.8 percent of the U.S. Gross Domestic Output, while the communications sector accounted for $514 billion or 2.25 percent. The IT sector has experienced a 2.7 percent compound annual growth rate (CAGR) since 1998, and the communications sector a 6.5 percent CAGR (U.S. Department of Commerce, Bureau of Economic Analysis, “Gross Domestic Product: Fourth Quarter 2006 (Advance),” available at http://www.bea.gov/newsreleases/national/gdp/2007/gdp406a.htm; Andrew Bartels, U.S. IT Spending Summary: Q3 2006, Forrester Research, Inc., Cambridge, Mass., November 29, 2006). 3 Data Privacy and Integrity Advisory Committee, Framework for Privacy Analysis of Programs, Technologies, and Applications, Report No. 2006-01, U.S. Department of Homeland Security, Washington, D.C., adopted March 7, 2006.
OCR for page 125
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment can be kept no longer than the defined period or that it must be kept at least until the defined period is over. When a data item is to be deleted, all copies of the item must be found and deleted from all automated and manual records. In the context of this report, data retention is a privacy and civil liberties issue when applied to personally identifiable information (PII) such as name plus Social Security number. The increased digitization of individuals’ personal and professional lives has led to dramatic increases in the amount of PII that is stored in automated and manual records. While this information provides significant value and convenience, it also exposes people to risks such as identity theft, one of the most frequent crimes in the United States, and to other digital crimes and loss of privacy. One report indicates that over 168 million data records have been compromised due to security breaches in the United States from January 2005 to October 2007.4 To protect the public from such crimes, state and federal governments have passed many laws and regulations5 and are continuing to draft new laws and regulations in response to the increased risks related to the growth of retained PII and the power of current technologies. These laws and regulations define data retention periods for specific types of data. Information retention poses complex and unresolved business, legal, and technical issues. In the normal course of business, data must be retained relative to the relevant business cycle, e.g., to monthly, quarterly, or annual billing cycles, and to the much longer, e.g., 10 years, statute of limitations periods during which legal disputes could arise and be prosecuted. At the same time organizations may want to delete data to reduce their exposure to compliance irregularities or potential legal discovery by data forensic techniques, data such as e-mail trials in the Enron case and voice mails in a case involving Hewlett Packard. Businesses must meet the requirements of relevant regulations; Sarbanes-Oxley is one of hundreds that are applicable to specific data types in specific business contexts. Legal issues include evolving and conflicting laws, regulations, and government requests. Within the United States, there are more than 45 different state data security and privacy laws and several evolving federal laws. Government agencies make conflicting requests. The Department of Justice (DOJ) and DHS requested lengthy retention periods to fight child pornography, e.g., 20 years, and terrorism, e.g., forever, respectively. At 4 Privacy Rights Clearing House, “A Chronology of Data Breaches,” posted April 20, 2005, available at http://www.privacyrights.org/ar/ChronDataBreaches.htm#CP. 5 See, for example, U.S. Congressional Research Service, Data Security: Protecting the Privacy of Phone Records, RL33287, Congressional Research Service, Library of Congress, Washington, D.C., updated May 17, 2006.
OCR for page 126
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment the same time, the Federal Communications Commission (FCC) and the Federal Trade Commission (FTC) requested shortened retention periods, e.g., 90 days, to protect privacy and other civil liberties. Technical issues involve keeping up with evolving data retention requirements, mediating between conflicting requirements, and simply implementing data retention policies covering unimaginable volumes of data. Information sharing causes information to be copied and distributed to other systems within an organization or via the Internet across the world. One form of information distribution is to publish it on paper or digital media, as reports, or for technical purposes such as backup and disaster recovery. Implementing a data retention policy requires that all copies be traced or identified so that they can be deleted compliant with the relevant policy. As the requirements change, so must technical solutions for managing the data retention policy as it applies to all copies. Entirely new content and record management technologies are being developed to automate data retention policies. Positive impacts of data retention laws and regulations include data minimizatoin—eliminating all data that are not essential to the relevant business purpose—and raising the previously low priority of data protection and security in all organizations. C.1.8 Issues Related to Data Linkage Additional issues arise when information is assembled or collected from a variety of sources for presentation to an application. Assembling such a collection generally entails linking records based on data fields such as unique identifiers (if present and available) or less perfect identifiers (such as combinations of name, address, and date of birth). In practice, it is often the case that data may be linked with little or no control for accuracy or ability to correct errors in these fields, with the likely outcome that many records will be linked improperly and/or that many other records that should be linked are not linked. Without checks on the accuracy of such linkages, there is no way of understanding how errors resulting from linkage may affect the quality of the subsequent analysis. (For more on issues related to data linkage, see Appendix H.) C.1.9 Connecting the Information Life Cycle to the Framework The framework defined in Chapter 2 of this report provides guidance on information practices to achieve efficacy of counterterrorism programs while ensuring adequate civil liberties protections. All information practices related to information-based programs can be considered in the context of the typical information life cycle. Each step of the life cycle is
OCR for page 127
OCR for page 128
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment along with the rational and experimental bases, must cover all steps of the information life cycle and be fully documented. C.2 THE UNDERLYING COMMUNICATIONS AND INFORMATION TECHNOLOGY C.2.1 Communications Technology Twenty-first century communications technology is in a continuing phase of rapid growth, evolution, and transformation. Today, there are more than 5,600 telecommunications providers in the United States. Whereas in the past providers were distinguished by the technology of the communications medium involved, more recently deregulation and advances in technology have led to a convergence of technologies and companies, and today any company can become a telecommunications provider, thus expanding both the number of service providers and the types of communications services. For example, the Shell Oil Company is treated for certain purposes as a communications service provider because it provides its customers Internet-based services with which to check or modify heating or other electrical appliances in their home. The scale of communications network usage is almost beyond imagination and growing rapidly. In the United States, the average annual growth rate in wireless calls, VoIP calls, and e-mail has been around 50 percent. In addition to these conventional forms of communication there is a wide range of new services such as instant messaging, small messaging service, video messaging, and a plethora of new business services communicated over the Internet. These communications are also enormous in data volume. A 2003 rough estimate7 of annual data volumes claimed over 9 exabytes of wireline calls and over 2 exabytes of wireless calls, with over 1.5 petabytes of Internet traffic. A rough approximation of an exabyte is 100,000 times the data volume that corresponds to the more than 19 million books in the Library of Congress. The data associated with telecommunications fall into three categories: The actual communication or content of the communication. In general but depending on the nature of the service, communications providers are generally precluded from examining content except for technical reasons such as improving quality of service. 7 P. Lyman and H.R. Varian, How Much Information, 2003, retrieved from http://www.sims.berkeley.edu/how-much-info-2003 on May 13, 2008.
OCR for page 129
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment The information required to manage and process the call, e.g., the source number, the destination number, the start time, and the end time, called call data records (CDR). (Such information is generally known as customer proprietary network information (CPNI).) Communications providers retain the management data for billing and other technical and business purposes, such as detection and prevention of telecommunications fraud, and thus maintain vast data repositories of CDRs (in the petabyte range). For example, in 2001 AT&T reported generating more than 300 million CDRs per day for 100 million long-distance accounts. Subscriber information, such as address, credit and billing information, and descriptions of services provided. As services become more sophisticated, the need for additional subscriber information grows to further define services and increase ease of use. For example, customer profiles kept by service providers on the Internet often include detailed preferences so that the automated service can meet customer needs without having to request that information on each use. Telecommunications companies collect data in all three categories. Access to CPNI is strictly governed by federal and other legislation and by telecommunications regulations with severe penalties for each violation. Due to the significant growth in the types of communications services and a continuing large growth in communications volumes, as well as significant advances in technology, the nature, management, and governance of CPNI must be constantly updated, and laws, regulations, and practices must be revised to reflect new and emerging opportunities and threats, including those related to counterterrorism and civil liberties. One illustration of the need for rebalancing is an ongoing tension between the FCC, the FTC, and civil liberties interests (who have argued for reducing the time that service providers retain CPNI) and DHS and DOJ (which have argued to increase retention time in case it is required for terrorist, legal, or other security purposes). Access to data in the other categories provides a more highly revealing portrait of personal behavior and is covered by law (although not telecommunications law). C.2.2 Information Technology For most citizens in daily life, the world is increasingly digital. Citizens apply electronically for government services, such as passports and licenses. In an increasingly cashless society, consumers engage in numerous financial transactions that are precisely recorded, often including the location and time. Whether for entertainment, personal, or professional purposes, clicks on the Internet are recorded for future use. Every trip is
OCR for page 130
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment recorded, from the airline, hotel, and car rental reservations to the actual events of the trip. Increasingly people and organizations publish detailed aspects of themselves, including electronic calendars, photographs, videos, music, and aspects of their personal lives. Increasingly activities in public places, stores, and enterprises are recorded and stored by surveillance systems. Educational institutions, e.g., flight schools, record their members’ activities. Employers record and retain extensive information on employees. With the increasing use of technologies such as RFID (radio frequency identification) tags, objects that people own and use provide personal information that can be read at a distance; for example, automobile and appliance parts, articles of clothing, retail products, and electronic devices such as telephones, personal data assistants, and computers can communicate information such as location, status, and temperatures. Moreover, the very types of personal information that can be collected are proliferating. For most of the 20th century, digital information referred to structured information such as name, address, telephone number, purchase order number, and the like. In the 21st century, digital information has expanded to include anything that can be represented digitally such as graphics, music, and video. There is a dramatic growth in unstructured information, captured, for example, by the 4.2 million closed-circuit television (CCTV) cameras in Britain—about one for every 14 people and other surveillance cameras in the United States, much of it stored for future processing. The scale of information processing undertaken in the United States is unimaginably large. Fortune 500 companies and large federal agencies are likely to have more than 5,000 information systems each with one or more databases. It would be rare to find any business of any size in the United States that did not have a significant investment in information systems and databases. The largest databases in the world, according to the 2005 bi-ennual Winter Corporation survey,8 exceeded 23 terabytes (TB) for transactional databases and more than 100 TB with 3 trillion entries for data warehouses, which is equivalent in data volume to 10 times the contents of the Library of Congress. Growth rates over 2 years for these databases were between a factor of 2 for transactional databases and a factor of 3 for the largest data warehouse. Over the past 4 years the average database size rose 243 percent, while the maximum size rose 578 percent. The use of these databases, or workloads, is equally staggering. The largest transactional workload was 1 billion SQL statements (e.g., a database query) per hour, with an average of 35 million and 30 million for the largest data warehouse (query only) workload, at an average of 2 million 8 K. Auerbach, 2005 TopTen Program Summary: Select Findings from the TopTen Program, Winter Corporation, Waltham, Mass., May 2006.
OCR for page 131
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment per hour. (SQL is a computer language for accessing and querying databases.) Winter estimated in 2005 that by 2008 transactional workloads would have grown 174 percent while data warehouse workloads would have quadrupled. While individual databases and their use are growing dramatically, so is the total number of databases. C.2.3 Managing Information Technology Systems and Programs There are many formally defined private-sector9 and government10 IT assessment frameworks, i.e., guidelines and best practices, for improving IT governance, transparency, and performance management, as well as improving specific areas, such as security,11 privacy,12 and information fairness.13 These frameworks are intended to quantify difficult-to-evaluate information systems objectives such as information systems effectiveness, quality, availability, agility, reliability, accuracy, completeness, efficiency, compliance with applicable regulations, and confidentiality. Although these criteria are difficult to define and evaluate, they are common requirements that the IT industry must evaluate for all critical systems on a regular basis. While there is never a simple or discrete answer, the IT industry must make its best approximation. Three of the 30 most widely followed frameworks are Control Objectives for Information and Related Technologies (COBIT), IT Infrastructure Library (ITIL), and International Organization for Standardization (ISO) 9 D. Aron and A. Rowsell-Jones, Success with Standards, Gartner EXP, Stamford, Conn., May 2006; The IT Governance Institute (ITGI), IT Governance Global Status Report—2006, ITGI, Rolling Meadows, Ill., 2006. 10 U.S. General Accounting Office (GAO), Information Technology Investment Management: A Framework for Assessing and Improving Process Maturity, GAO-04-394G, Version 1.1, GAO, Washington, D.C., March 2004. 11 U.S. Office of Management and Budget, “Security of Federal Automated Information Resources,” OMB Circular A-130, Appendix III, available at http://www.whitehouse.gov/omb/circulars/a130/a130appendix_iii.html, revises procedures formerly contained in Appendix III to OMB Circular No. A-130 (50 FR 52730; December 24, 1985) and incorporates requirements of the Computer Security Act of 1987 (P.L. 100-235) and responsibilities assigned in applicable national security directives; W.H. Ware, ed., Security Controls for Computer Systems: Report of Defense Science Board Task Force on Computer Security, AD # A076617/0, Rand Corporation, Santa Monica, Calif., February 1970, reissued October 1979; Federal Information Security Management Act of 2002 (FISMA, 44 U.S.C. § 3541, et seq.). 12 Data Privacy and Integrity Advisory Committee, Framework for Privacy Analysis of Programs, Technologies, and Applications, Report No. 2006-01, U.S. Department of Homeland Security, Washington, D.C., adopted March 7, 2006. 13 U.S. Department of Health, Education, and Welfare, Secretary’s Advisory Committee on Automated Personal Data Systems, Records, Computers, and the Rights of Citizens, Code of Fair Information Practices, July 1973, available at http://aspe.hhs.gov/datacncl/1973privacy/tocprefacemembers.htm.
OCR for page 132
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment 17799.14 In comparison with COBIT, which has 34 high-level objectives that cover 215 control objectives, the committee’s framework has two high-level objectives (i.e., effectiveness, and consistency with U.S. laws and values) that cover 30 control objectives. Although no one framework has the same high-level and control objectives as the committee’s framework, they nevertheless provide guidance for achieving all of the committee’s information and communications technologies criteria. Analysts advise that organizations judiciously select specific frameworks or criteria based on their relevance to well-defined objectives and the readiness of the organization to apply them.15 This method applies also to implementing the committee’s framework. Most IT organizations surveyed worldwide16 and in the United States17 have adopted a framework. While many have developed their own, there is increasing adoption of formal frameworks based on reports of their efficacy, such as a 30 percent increase in productivity over 2 years through a consistent application of formal frameworks.18 Failures with framework implementation are often related to inappropriate selection of criteria, as well as to formulaic implementations that emphasize process and checklists by those who do not understand the objectives or how to evaluate whether they have been achieved. 14 The IT Governance Institute (ITGI), IT Governance Global Status Report—2006, ITGI, Rolling Meadows, Ill., 2006. 15 D. Aron and A. Rowsell-Jones, Success with Standards, Gartner EXP, Stamford, Conn., May 2006. 16 The IT Governance Institute (ITGI), IT Governance Global Status Report—2006, ITGI, Rolling Meadows, Ill., 2006. 17 C. Symons, IT Governance Survey Results: More Work to Be Done, Forrester Research, Cambridge, Mass., April 14, 2005. 18 D. Aron and A. Rowsell-Jones, Success with Standards, Gartner EXP, Stamford, Conn., May 2006.