1
Principles and Problems

Privacy is the most comprehensive of all rights and the right most cherished by citizens of a free nation.

Justice Louis Brandeis Olmstead v. United States, 1928

A people who mean to be their own Governors, must arm themselves with the power to which knowledge gives.

James Madison, 1822

THE TENSION BETWEEN PRIVATE LIVES AND PUBLIC POLICIES

Private lives are requisite for a free society. To an extent unparalleled in the nation's history, however, private lives are being encroached on by organizations seeking and disseminating information. In their stewardship of data collection and data dissemination, federal statistical agencies have had a long-standing concern for the privacy rights of their data providers, but they now face mounting demands for privacy in the wake of such external developments as telemarketing through random digit dialing and computerized capture of data on everyday activities, like supermarket purchases by credit card.

In a free society, public policies come about through the actions of the people. Those public policies influence individual lives at every stage—financing of prenatal care, state aid to school districts, job training and placement, law enforcement, and determining retirement benefits. Data provided by federal statistical agencies, such as the Bureau of Justice Statistics, the Bureau of Labor Statistics, the National Center for Education Statistics, and the National Center for Health Statistics, are the factual base needed for informed public discussion about the direction and implementation of those policies. Further, public policies encompass



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics 1 Principles and Problems Privacy is the most comprehensive of all rights and the right most cherished by citizens of a free nation. Justice Louis Brandeis Olmstead v. United States, 1928 A people who mean to be their own Governors, must arm themselves with the power to which knowledge gives. James Madison, 1822 THE TENSION BETWEEN PRIVATE LIVES AND PUBLIC POLICIES Private lives are requisite for a free society. To an extent unparalleled in the nation's history, however, private lives are being encroached on by organizations seeking and disseminating information. In their stewardship of data collection and data dissemination, federal statistical agencies have had a long-standing concern for the privacy rights of their data providers, but they now face mounting demands for privacy in the wake of such external developments as telemarketing through random digit dialing and computerized capture of data on everyday activities, like supermarket purchases by credit card. In a free society, public policies come about through the actions of the people. Those public policies influence individual lives at every stage—financing of prenatal care, state aid to school districts, job training and placement, law enforcement, and determining retirement benefits. Data provided by federal statistical agencies, such as the Bureau of Justice Statistics, the Bureau of Labor Statistics, the National Center for Education Statistics, and the National Center for Health Statistics, are the factual base needed for informed public discussion about the direction and implementation of those policies. Further, public policies encompass

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics not only government programs but all those activities that influence the general welfare, whether initiated by government, business, labor, or not-for-profit organizations. Thus, the effective functioning of a free society requires broad dissemination of statistical information. Juxtaposing the Brandeis and Madison quotations above reveals a basic tension in the stewardship of federal statistical agencies. Such agencies seek, on the one hand, to ensure private lives for the citizenry. On the other hand, they seek to provide the data on which public policies are based. Yet, because of concerns about data confidentiality, there is a large unmet need for greater access to data. Data users, whether in federal agencies (including policy research units), state or local governments, academe, trade associations, businesses, market research organizations, political interest groups, or the media, have persistently asked for increased access to data. How can federal statistical agencies serve data users better by providing more access to useful data and at the same time serve data providers by better ensuring privacy and confidentiality? What principles must guide their actions? What are the key problems? Because facts are the lifeblood of a free society, answers to these questions concern many people—not just government statisticians. In particular, they are of concern to data users and data providers. Answers to fundamental questions of confidentiality and data access affect many data users. Included in this category are researchers (e.g., university researchers investigating the affordability of housing), policy analysts (e.g., staff of an educational policy group assessing the influence of federal student loans on degree completion rates), and legislators and congressional staff (e.g., members of the Joint Committee on Taxation who are modeling the effects of proposed changes in the tax code). Answers to questions of confidentiality and data access also affect many data providers. Included in this category are individuals (e.g., those in households throughout the country who respond to decennial censuses), institutions (e.g., nursing homes that provide data to the National Center for Health Statistics), and enterprises (e.g., car manufacturers that respond to the Energy Information Administration's Manufacturing Energy Consumption Survey). In addition, many other individuals and groups are concerned with data use and data supply. Included in this category are legislators (e.g., members of the House Committee on Government Operations); lawyers (e.g., attorneys trying to build a class action

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics case for Medicaid clients turned out of nursing homes or representing a client whose research files on a project funded by the National Institutes of health have been subpoenaed); journalists (e.g., a reporter writing about freedom of information or privacy issues involving AIDS); and business executives, who simultaneously are asked for data on their operations and are seeking information on their business environment. Finally, advocacy groups represent various positions on these issues. Groups such as Computer Professionals for Social Responsibility are concerned with computer-related privacy issues. And professional organizations such as the Association for Public Data Users are concerned with issues like access to health data on minority populations. STUDY GOALS AND SCOPE The federal statistical system is complex and far reaching. It encompasses more than 70 federal agencies having a role in collecting data from individuals, households, farms, businesses, and governmental bodies and disseminating data for statistical purposes. Those statistical purposes include description, evaluation, and research. Although it may collect data from individual respondents, administrative records, or organizations, a statistical agency does not do so in order to take administrative, regulatory, or enforcement action toward a particular individual or organization.1 Indeed, it is not concerned with identifying the respondent with particular information, but rather with describing and inferring patterns, trends, and relationships for groups of respondents (National Research Council, 1992b). The Panel on Confidentiality and Data Access was charged by the Committee on National Statistics and the Social Science Research Council with developing recommendations that could aid federal statistical agencies in their stewardship of data for policy decisions and research. Three areas were of paramount concern in our deliberations: protecting the interests of data subjects through procedures that ensure privacy and confidentiality, enhancing public confidence in the integrity of statistical and research data, and facilitating the responsible dissemination of data to users. Deciding on the exact scope of our investigation was not easy, for the boundaries of federal statistical activities are not clearly defined. Federal statistical activities include the development and dissemination of large, general-purpose data sets based on censuses, surveys, and administrative records. They also include

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics the collection and analysis of personal data in experimental research with human subjects. A few federal statistical agencies conduct general or multipurpose programs, but many others conduct specialized statistical programs or activities. In addition, some basically programmatic agencies, like the Federal Aviation Administration, the Health Care Financing Administration, and the Internal Revenue Service, conduct some statistical activities. Moreover, many federally supported statistical activities are carried out by contractors and grantees, and some statistical activities depend heavily on federal-state cooperative arrangements of long-standing. Finally, the data subjects and units of analysis for statistical programs include persons and organizations, but when the concepts of privacy and confidentiality are applied to organizations, they have quite different meanings than they do when applied to persons. To make our task manageable, we decided to concentrate our attention on major federal statistical programs and to look beyond them only to the extent that seemed necessary to provide adequate coverage of confidentiality and data access questions related to those programs. In addition, given the basic nature of the issues and our backgrounds and experience (see Appendix B), we were more comfortable dealing with our charge as it applied to personal, rather than organizational, data. Nevertheless, input from statistical agencies and events that occurred during our deliberations made it clear that there are major issues relating to data for organizations that cannot be ignored. Accordingly, we lay out a framework for the treatment of data on organizations in Chapter 7. We also refer to organizations in other chapters, especially Chapter 5, which deals with legislation. We believe that confidentiality and data access questions for organizations warrant more attention than they have received in the past and more than we have been able to give. We hope that the federal statistical agencies and others will pursue them through systematic studies. Given the complexity of the federal statistical system, designing an ideal configuration to address confidentiality and data access issues throughout the system, or even in any one federal statistical agency, is too daunting for this panel. Instead, we seek to contribute to a long tradition in the statistical community of periodically reconsidering current institutional structure and practice. Fundamentally, we seek to spur this ongoing process by articulating and applying three tenets of an ethic of information in a free society: democratic accountability, constitutional empowerment, and individual autonomy. These tenets are consistent with the

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics ethos of American society and are the basis for trustworthy operation of federal statistical agencies. We believe that this attention to underlying principle can have a more beneficial and lasting impact than would an attempt to provide detailed recommendations for micromanagement of agency confidentiality and data access procedures. Although we recognize the inherent tension between data protection and data access, we do not advocate a specific trade-off between the two. The dynamics of such a trade-off would be complicated and heavily influenced by the missions and operational environments of individual agencies, so that a single solution would not work for every agency. Nevertheless, we see some opportunities to enhance data access without decreasing data protection, and some opportunities to increase data protection without diminishing data access. Accordingly, society faces win/no loss situations, arguably because current institutional arrangements are inadequate. To develop this theme and inductively build a better understanding of the broad principles, we use as illustrations particular circumstances that individual agencies currently confront. Many topics that fall under the rubric of societal concerns for effective, efficient, and ethical collection and use of information are largely outside the scope of this report. We are mindful of the burgeoning information industry in the private sector, including marketing firms and consumer credit bureaus. While we have concerns about this industry's impact on individual privacy, those concerns are outside the scope of this report. Nor do we address the paperwork and privacy burden imposed by government in administering its programs. Directly within the scope of this report is the functioning of the federal statistical system as it grapples with confidentiality and data access issues. Specifically, we have a concern for how a federal statistical agency relates to its three diverse and overlapping constituencies: data providers, government itself, and data users. We direct this report to all concerned with confidentiality and data access issues. We hope that the ideas and recommendations we put forth will stimulate consideration of these critical issues by all concerned and make each interested party more aware of the legitimate concerns of the other parties. Further, we hope that this report will stimulate discussion of confidentiality and data access issues across levels of government and across geographical boundaries.

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics HOW DOES THE FEDERAL STATISTICAL SYSTEM FUNCTION? The federal statistical system is a largely decentralized agglomeration of agencies. Each agency functions under the guidance of officials in a government department, as the Bureau of the Census does in the Department of Commerce and the National Center for Education Statistics does in the Department of Education. The agencies operate with various legal authorizations to compile, analyze, and disseminate statistical data. According to the Office of Federal Statistical Policy and Standards (1978:413), The ultimate purpose of all Federal statistical collection activities is to develop statistical material for use by Government agencies, policymakers, and the public. The ultimate test of the Federal Statistical System is the availability of relevant information; therefore, the question of data access must continually receive a high level of attention. Identifying seven of the larger federal statistical agencies illustrates the reach of the system's activities: Bureau of the Census Bureau of Justice Statistics Bureau of Labor Statistics Energy Information Administration National Agricultural Statistics Service National Center for Education Statistics National Center for Health Statistics Some of these agencies have purposes that are not purely statistical. The Energy Information Administration (EIA), for instance, is in some cases required to provide identifiable data in support of regulatory and program needs of the Department of Energy. Each of the existing federal statistical agencies was founded in response to needs for data bearing on critical areas of public policy. A recent addition is the Bureau of Transportation Statistics, which was established by Congress in 1991 to compile and analyze data related to such issues as the environmental impact of mass transit, the factors affecting choice of transportation mode, and the nature of vehicular accidents. Congressional support has been voiced for the establishment of specialized statistical agencies in other crucial policy areas, as well. In collecting data, as noted, the agencies of the federal statistical system obtain data from individual respondents (e.g., the U.S. Fish and Wildlife Service's Survey of Fishing, Hunting, and

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics Wildlife Related Recreation elicits data from outdoor enthusiasts on their recreational activities), households (e.g., the Census Bureau conducts the Consumer Expenditures Survey for the Bureau of Labor Statistics), organizations (e.g., the National Agricultural Statistics Service collects information from farm operators on some 120 crops and 45 varieties of livestock), and governmental sub-units (e.g., the Census Bureau collects data from state and local governments). In disseminating data, federal statistical agencies provide information to a range of clients. Within the government, clients include researchers within the same department (e.g., the National Agricultural Statistics Service provides data to the Economic Research Service in the Department of Agriculture), other federal agencies (e.g., the Internal Revenue Service's Statistics of Income Division furnishes data to the Bureau of Economic Analysis in the Department of Commerce), state and local governments (e.g., the National Center for Education Statistics supplies data to the Commonwealth of Pennsylvania's Department of Education), business firms (e.g., the National Center for Health Statistics makes data available to the Kaiser Permanente Health Maintenance Organization), Congress (e.g., the Bureau of Economic Analysis calculates the gross domestic product, which informs the Congressional Budget Office), and the Executive Office of the President (e.g., the Census Bureau gives information on job seeking by the unemployed to the Council of Economic Advisers). Outside the government, clients include the media (e.g., the New York Times, September 23, 1990:Section 3, p. 11, reported on investment in farms using Department of Agriculture figures), individual members of the public (e.g., the 1990 World Almanac, which sold over 54 million copies, is replete with government statistics), and academic researchers (e.g., Babcock and Engberg, 1990, use Bureau of Labor Statistics data as a reference point for their survey of local labor market conditions). With limited authority and resources, the Office of Management and Budget, through its Statistical Policy Office, provides long-range planning for statistical programs and coordinates statistical policy within the federal government. The Statistical Policy Office does not have direct administrative responsibility for federal statistical agencies, however.

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics KEY DEFINITIONS A lack of general agreement on terminology has caused some confusion in discussions of the issues involved in data protection and data access. The panel found the following definitions for certain key terms to be helpful. They are used consistently in this report. DATA SUBJECTS AND DATA PROVIDERS Data subjects are persons, households, or organizations for which data are obtained and presented in statistical form. The data subjects are not always the data providers, however. One person in a household may respond to a survey questionnaire that asks for information about all members of that household. Data providers are often called respondents. When respondents provide information for data subjects other than themselves, they are called proxy respondents. INFORMATIONAL PRIVACY Privacy has multiple definitions depending on what aspect of this broad concept is being stressed. We take the following as our working definition: Informational privacy encompasses an individual's freedom from excessive intrusion in the quest for information and an individual's ability to choose the extent and circumstances under which his or her beliefs, behaviors, opinions, and attitudes will be shared with or withheld from others. CONFIDENTIALITY AND DATA PROTECTION Confidentiality refers broadly to a quality or condition accorded to information as an obligation not to transmit that information to an unauthorized party (National Research Council, 1991:289). This has implications that range over religious confessionals, national security, private business ''whistleblowing," and disclosures of crimes. Our concern is, more narrowly, with the promises, explicit or implicit, made to a data provider by a data gatherer regarding the extent to which the data provided will allow others to gain specific information about the data provider or data subject. Confidentiality has meaning only when the promises made to a data provider can be delivered, that is, the data

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics gatherer must have the will, technical ability, and moral and legal authority to protect the data. Our definition of confidential data is consistent with the position of the President's Commission on Federal Statistics (1971:222): [Confidential should mean that dissemination] of data in a manner that would allow public identification of the respondent or would in any way be harmful to him is prohibited and that the data are immune from legal process. Data protection refers to the set of privacy-motivated policies and procedures that ensure minimal intrusion by data collection and maintenance of data confidentiality. The term is generally used in the context of protecting personal information (Flaherty, 1989). Unlike privacy, however, which is an individual right, confidentiality is not restricted to data on individuals and is often extended to data on organizations. INFORMED CONSENT AND NOTIFICATION Informed consent and notification are related, but distinct, ethical and legal concepts. From our perspective, informed consent refers to a person's agreement to allow personal data to be provided for research and statistical purposes. Agreement is based on full exposure of the facts the person needs to make the decision intelligently, including any risks involved and alternatives to providing the data (derived from Black et al., 1990:779). Informed consent describes a condition appropriate only when data providers have a clear choice. They must not be, nor perceive themselves to be, subject to penalties for failure to provide the data sought. Notification also involves a condition of data provision under full exposure of pertinent facts. Unlike with informed consent, however, the elements of choice and agreement are absent. Notification is the more appropriate concept when data provision for stated purposes is mandatory, as it is in the decennial census of population. DISCLOSURE Disclosure relates to inappropriate attribution of information to a data subject, whether an individual or an organization. Disclosure occurs when a data subject is identified from a released file (identity disclosure), sensitive information about a data subject

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics is revealed through the released file (attribute disclosure), or the released data make it possible to determine the value of some characteristic of an individual more accurately than otherwise would have been possible (inferential disclosure). Inferential disclosure is the most general of the types of disclosure. The definition above was suggested by Dalenius (1977) and supported by Statistical Policy Working Paper 2 (Federal Committee on Statistical Methodology, 1978:41). Underlying that support was the belief that it offers the best basis for (1) identifying all potential disclosures in connection with proposed releases, (2) deciding which of the potential disclosures are unacceptable, and (3) using appropriate techniques to prevent unacceptable disclosures. ADMINISTRATIVE AND STATISTICAL DATA One purpose of data collection concerns a course of action that affects a particular person or business. The purpose can be regulatory, administrative, legislative, or judicial. Examples include tax audits of a person, couple, or corporation; a criminal investigation into a report of arson; license renewal for a liquor store; and determination of welfare benefits. We refer to these purposes generically as administrative. Another purpose of collecting data is to generate an aggregate description of a group of persons or businesses. No direct action is taken for or against a specific individual or business, although as a result of the information, policy changes based on such information could result in benefits or costs to persons or businesses. Examples include development of a formula for determining which tax returns should be audited, investigating geographical patterns of arson in a large city, and researching the relationship between the incidence of liquor law violations by stores and store characteristics or how the duration of welfare benefits varies with the educational level of the recipient. We refer to these purposes generically as statistical. Consistent with the distinction between administrative and statistical data, the Privacy Act of 1974 (P.L. 93–579) defines a statistical record to be a record in a system of records maintained for statistical research or reporting purposes only and not used in whole or in part in making any determination about an identifiable individual, except as provided by Section 8 [which authorizes certain kinds of data access, including for research activities by the Bureau of the Census] of Title 13.

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics WHAT PRINCIPLES SHOULD GUIDE STATISTICAL AGENCIES? As creations of government, statistical agencies mirror, with varying shadings, the ethos of their society. For the United States, and a growing list of other countries, this ethos embraces a freedom that recognizes pluralism, public decision making based on representative democracy, and a market-oriented economy. Statistical agencies reflect those themes through a remarkably intricate configuration of institutional structure and practice. As noted above, the principles of democratic accountability, constitutional empowerment, and individual autonomy maintain the ethos of American society and provide valuable ethical guidance for the structure and practice of federal statistical agencies. Recognizing that the guidance they provide is often reinforcing but is not always harmonious, we examine each principle in turn. When we apply the principles in subsequent chapters, we either explore ways that they can be reconciled or note that difficult trade-offs must be made. DEMOCRATIC ACCOUNTABILITY Functionally, accountability recognizes the responsibilities of those who serve on behalf of others. With position and involvement in manifold areas of life, government in a democracy should serve the public—collectively, individually, and in assorted assemblages—and be accountable to it. As John Locke (1690/1988:426–427) said in his Two Treatises of Government, Who shall be Judge whether the Prince or Legislative act contrary to their Trust … The People shall be Judge; for who shall be Judge whether his Trustee or Deputy acts well, and according to the Trust reposed in him, but he who deputes him. In implementation, accountability requires that the public obtain comprehensive information on the effectiveness of government policies. Prewitt (1985) addresses this relationship between government and citizens as "democratic accountability." Federal statistical agencies play a pivotal role in ensuring democratic accountability—they obtain, protect, and disseminate the data that allow accurate assessment of the influence of public policies on individual well-being. Further, they themselves are accountable to the public for two key functions in this process: (1) protecting the interests of data subjects through procedures that ensure appropriate

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics documents retained in respondent's files from compulsory legal process" (Office of Federal Statistical Policy and Standards, 1978:256). As another example, the statistical arm of the Department of Energy, the Energy Information Administration, collects proprietary information on pricing and production from petroleum companies. In 1990, the Department of Justice's Antitrust Division requested individually identifiable data from the agency to investigate alleged price gouging by oil companies in the aftermath of the Iraqi invasion of Kuwait. The agency refused, citing its policy and pledge to keep the data, which had been collected for statistical purposes, confidential. In the ensuing disputations between the agency and the Department of Justice, it became evident that the agency lacked unambiguous legal authority to sustain its confidentiality pledge. We explore this case in detail in Chapter 7. In discussing data protection, we emphasize the fundamental distinction between administrative data and statistical data. Important differences exist among the types of data collected by the government, especially data collected for statistical and research purposes versus data collected for the administration of government programs. Administrative data often have inherent research value, and statistical uses are appropriate, provided confidentiality safeguards can be maintained. Data from Medicare and Medicaid records, for example, are properly used in studying the pattern of medical procedures, such as coronary angioplasty, as they vary by region of the country, race, and gender. Although in certain situations access to statistical data may seem administratively convenient, most administrative uses would violate pledges of confidentiality. The simple statement "Your answers are confidential" on the cover of a census form ought to mean, for example, that information provided on household composition will not be used to check eligibility for Aid to Families with Dependent Children. As we have noted, federal statistical agencies have experienced pressure to provide data for administrative purposes. Withstanding such pressure can be especially difficult for federal statistical agencies (or programs with statistical functions) that are housed in units with important regulatory functions. (This point was emphasized by the Office of Federal Statistical Policy and Standards, 1978:258.) The central concept of providing legislative and procedural protection to ensure that personal data collected about a data subject for a research or statistical purpose are not used for an administrative or other decision about that data subject is generally

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics called functional separation. It is not a new concept, having been recommended, for example, in a 1973 study by the Secretary's Advisory Committee on Automated Personal Data Systems (U.S. Department of Health, Education and Welfare). The concept was enunciated by the Privacy Protection Study Commission (1977a:574). Stating the concept as a recommendation for legislation by the U.S. Congress, the commission proposed that the Congress provide by statute that no record or information contained therein collected or maintained for a research or statistical purpose under Federal authority or with Federal funds may be used in individually identifiable form to make any decision or take any action directly affecting the individual to whom the record pertains, except within the context of the research plan or protocol, or with the specific authorization of such individual. We examine this concept throughout the report. CAN COMMUNICATION WITH THE PUBLIC BE IMPROVED? For federal statistical agencies to achieve full democratic accountability, they must be continuously cognizant of public perceptions regarding the central issues of data protection and data access. This may require systematic studies of public opinion, as has been done with various surveys of the public's general perception of the Census Bureau.6 Such studies could examine, for example, the extent to which the public continues to have a general distrust of centralized government records, a distrust of the kind that led to the Privacy Act of 1974. They could also assess the public's concern about the data collection and dissemination powers of the private sector information industry. And they could examine the extent to which various groups of the public distinguish between statistical agencies and administrative agencies regarding issues of data protection and data access. Interpreting general survey results, however, is not easy; any general political alienation will drive down confidence and be reflected in responses about a specific area of government activity. More easily interpretable are systematic studies of the data provider and data user communities that interact with statistical agencies. A diversified portfolio of study techniques, including targeted surveys, data user and data provider conferences, reinterviews, focus groups, and small-scale experiments may be most effective. In the interests of accountability and openness, the results of such studies should be made publicly available.

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics How can federal statistical agencies best communicate to the general public, the data provider communities, and the data user communities the importance of the statistical information they can provide? How can they best instill confidence in these diverse groups regarding the three key aspects of their data protection policies: (1) their intention to minimize intrusions on privacy, (2) their intention to minimize the time and efforts of data providers, and (3) their ability to maintain confidentiality? Given the principle of democratic accountability, how are the interests of the public best served? Institutionally, Congress addresses legislation that, for the most part, sets fairly general guidelines for agency policy and practice. At the more specific level of the development and implementation of agency policy regarding confidentiality and data access, mechanisms are beginning to emerge for ensuring input from representatives of the key affected groups. A variety of such mechanisms are possible, including data user conferences, meetings with privacy advocates regarding the conduct of key surveys, agency review boards with outside representation, and a government-wide Data and Access Protection Board. The topic of public perceptions and interests is addressed further in Chapters 2, 3, and 4. Alternative mechanisms for ensuring input from the affected groups are discussed in Chapter 8. ARE DATA PROVIDERS PROPERLY NOTIFIED OR INFORMED? Most often, for data collected for administrative purposes, the data provider either is legally required to provide the data—as with tax return filings—or must provide the data in order to receive some benefit—as with driver's license applications. In broad terms, the ethics of what the agency should tell the data provider about the use of such administrative data are not controversial. The data provider should be notified about the need for such data and how providing the data might affect him or her—and that one potential use is for statistical purposes. A more complicated ethical question is what options data providers should have in denying various uses of the administrative data they provide. On the other hand, with the notable exception of the decennial census, response to most statistical surveys of persons and households is voluntary. Law and ethics require that consent for participation in voluntary statistical surveys be informed. Potential respondents should be told how their data will be used and what the consequences will be to them of participating or not participating in the survey. They should be given the opportunity

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics to make a conscious decision whether to provide the data requested. There are many open questions about the nature of informed consent procedures for voluntary surveys. Do the current practices of the statistical agencies conform fully with legal and ethical requirements? Is appropriate information being given to respondents in language that they can readily understand? Should the same amount of detail be given to all potential respondents? What are suitable procedures for use in telephone and mail surveys? Are the statistical agencies always able to honor the promise of confidentiality protection they give to respondents? To what extent should respondents be allowed to waive standard protections in order to permit data sharing for statistical purposes? We develop these issues in Chapter 3. ARE CURRENT CONFIDENTIALITY AND DATA ACCESS LAWS ADEQUATE AND APPROPRIATE? We discussed above the issue of whether statistical agencies have adequate authority to protect the data they collect. Here we introduce some other issues of legislation. The laws that govern confidentiality of and access to federal statistical data include some with general reach, like the Freedom of Information Act and the Privacy Act, but others vary widely from agency to agency. At one end of the spectrum, Title 13 of the U.S. Code provides extremely tight confidentiality provisions for the Census Bureau. At the other end, many agencies operate without any specific confidentiality legislation. Critics charge that certain provisions of current legislation not only fail to provide sufficient protection of confidential statistical data but create excessive barriers to data access. For example, the Office of Federal Statistical Policy and Standards (1978:262) noted obstacles to interagency data sharing and, in particular, the inability of agencies to gain access to the Census Bureau's Standard Statistical Establishment List for statistical sampling purposes. There are also obstacles to access by nongovernment data users. Comprehensive revision of the relevant laws risks a final product worse than the status quo. Further, legislation requires support if it is to be effective. Congress and the President must be committed to confidentiality protection and the public must understand its importance. On the other hand, the current approach of ad hoc legislative initiatives is not working very well either. Thus, in Chapter 5, we pose and address questions of variation among statistical agencies in the protection

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics offered identifiable statistical records, possible amendments of the 1974 Privacy Act, legislative language about "zero disclosure risk," and greater opportunities for data sharing among federal statistical agencies for statistical purposes. HOW CAN NONGOVERNMENT USERS BE GIVEN ACCESS TO DATA WHILE PRESERVING CONFIDENTIALITY? Despite efforts by several agencies to improve access to data by researchers and policy analysts outside the government, nongovernment users' need for detailed data is far from being fulfilled. Often, such data contain information on individual respondents over time, as it would for any study of how firms respond to environmental regulations, for example. Specifically, consider assessing the likely impact of a tax, levied in response to the threat of global warming, on the amount of carbon emitted from a plant. Combined information from EIA's Manufacturing Energy Consumption Survey and the Census Bureau's Longitudinal Research Database (see McGuckin and Pascoe, 1988) would show how plants would react to the implied change in the relative price of energy. There are many aspects to such a policy question and to answer it well researchers would have to address it from various perspectives. Weighed against this clear benefit, access poses serious data protection problems, and so the researchers most likely to answer the question may not get access to the data they need. Statistical agencies have developed some procedures for providing greater access to data to selected researchers under conditions that subject the researchers to enforceable penalties for violations of confidentiality standards. For example, the designation of researchers as special sworn employees has been standard practice at the Census Bureau. This practice has clearly protected confidentiality in that special sworn employees have the same responsibilities as regular employees, but it has also limited the type and amount of research that can be done with the data. The access provided is of a temporary nature and must be carried out on site, under supervision, and only for purposes of the Census Bureau that are designated under Title 13 legislation. Further, it puts at a disadvantage researchers who are not currently in the Washington, D.C., area or cannot easily relocate. Recently, the National Center for Education Statistics and the National Science Foundation have developed, and are evaluating, approaches for licensing researchers to have access to data for

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics statistical purposes, with penalties for improper use. Further, the Census Bureau has been examining for some time the possibility of establishing facilities for data access outside the Washington, D.C., area, perhaps at its regional offices or at a university under a data protection agreement. Most concretely, the Bureau of Justice Statistics has provided data to form the National Justice Data Archive at the Inter-University Consortium for Political and Social Research, which is located at the University of Michigan. This archive functions under the legal authority and confidentiality protection of the Code of Federal Regulations (§ 28, Pt. 22). The Bureau of Justice Statistics provides a project monitor to the archive. These topics of data access are further developed in Chapters 6 and 7. CAN INDIVIDUAL-LEVEL DATA BE PROVIDED FOR PUBLIC USE? Fritz Scheuren (1989:20), as director of the Statistics of Income Division of the Internal Revenue Service, described this predicament of the data collector: Statistical Disclosure-Avoidance is an enormous problem. On the one hand, we want to make all the microdata [sets of individual records with identifiers removed] we produce publicly available so researchers can benefit fully; on the other hand, we have to protect respondents (or taxpayers, in my case) from having identifiable information inadvertently disclosed. While some progress has been made since Scheuren's writing, the complexity of the task and the fundamental nature of the problem leave substantial work remaining. Statistical agencies make difficult decisions on how best to disseminate their products so that maximum value is obtained from them while protecting confidentiality. What kinds of data should be released with no restrictions? What kinds of statistical disclosure limitation techniques should be applied to data before releasing them for unrestricted public-use? Will researchers be led to incorrect inferences because of such techniques? What is a reasonably small risk of disclosure of individually identifiable data? How should such decisions be made, and by whom? We develop this topic in Chapter 6.

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics CAN LEGITIMATE NEEDS FOR DATA SHARING WITHIN GOVERNMENT BE MET? Are young Americans entering the labor market prepared for the competitive challenge? Government statistics can help answer only part of this critical question. The National Center for Education Statistics, in its National Educational Longitudinal Study of 1988, started with 25,000 eighth graders and gathered periodic measurements on their academic performance, school and social environments, and family background. The Bureau of Labor Statistics surveys youth employment. The Bureau of Justice Statistics has some information on those in prison. As Hauser (1991:2) succinctly observes, however, "the overall effect of fragmented responsibility and piecemeal coverage is that, once youths leave high school, our statistical system treats them almost as if they had dropped off the face of the earth." Actually, some surveys by the National Center for Education Statistics do track students after high school. For example, the National Longitudinal Study of the Class of 1972 tracked students as they made the transition to their twenties. In general, however, there is a need for suitable interagency coordination in meeting data needs and, possibly, for some interagency data sharing. Sharing of identifiable data for statistical purposes can have many potential benefits, including the enrichment of cross-sectional and longitudinal data sets, evaluation and improvement of the quality of census and survey data, improvement of the timeliness and consistency of statistical reporting, development of more complete sampling frames, and improvement of comparability between data developed by different statistical agencies. A significant amount of data sharing has occurred without incident, both between statistical agencies and from administrative agencies to statistical agencies. For many years, for example, the Census Bureau has used tax records on individuals and businesses to enhance its demographic censuses and surveys and to evaluate the quality of census and survey data. (Additional detail on this point is provided in Chapter 6.) Similarly, identifiable patient records are routinely used under controlled conditions in medical and epidemiologic research. Decisions on whether to link records require careful examination of several factors. Will the linking be done under conditions that conform with all statutory confidentiality standards of the agencies involved and with pledges to data providers concerning the use of their information? Is record linkage the only feasible

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics way to develop the desired statistical products? Are the intended uses of these products sufficiently important to justify introducing additional risks of disclosure? Should agencies take into account potential reductions of (1) the time and effort of data providers and (2) the costs of obtaining the desired data by other means? If they should, how would they do it? Who will have access to the linked data sets? Who should decide? The statutes, regulations, and policies that affect decisions to share data and participate in record linking projects vary widely among agencies. Statistical agencies, like the Census Bureau, and custodians of administrative records, like the Internal Revenue Service, operate under strict statutory controls on disclosure of identifiable data. Other agencies, like the National Agricultural Statistics Service, have the authority to deny access for nonstatistical purposes but also have more flexibility to share data for statistical purposes. Still other agencies operate primarily under general information statutes, like the Privacy Act of 1974, and have little difficulty in finding ways to participate in record linkages for statistical and research purposes, if they choose to do so. The value of a survey data base can sometimes be substantially enhanced by adding data from administrative records for the persons in the data base (Juster, 1991). In carrying out a health and retirement survey, for example, researchers might like to use earnings records from Social Security files. Other valuable administrative records include case files from public assistance programs, health care claims, and tax returns. Such linkages are valuable for a number of reasons: Existing administrative records may be more accurate than survey data obtained from respondents, especially for detailed information that must be recalled for earlier years, like income data. Surveys are made less burdensome and intrusive because certain questions need not be asked. Administrative records can provide a check on the quality of survey results, and vice versa. While record linkage may facilitate the basic tasks of a statistical agency, it also raises serious confidentiality concerns. Ivan P. Fellegi, chief statistician of Canada, noted in a May 2, 1991, communication to the panel, The issue of "moral outrage" as a possibility is real, but I believe [it] applies particularly to record linkage (matching). For this reason we are particularly careful about it. The topic of record linkage is developed further in Chapters 4–7.

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics STRUCTURE OF THE REPORT In Chapter 2 we briefly review the evolution of the federal statistical system, the findings of earlier studies of confidentiality and data access, and the changes that warrant a reexamination of the issues. To complete the contextual framework of our study, we also provide an overview of our assessment of the responsibilities federal statistical agencies must be able to fulfill in their dealings with the public, data providers and data subjects, data users, other statistical agencies, and custodians of administrative records. In Chapter 3 we address fair treatment of data providers, in particular the use of informed consent as an instrument for ethical communication by data collectors. Drawing on survey experiments, cognitive studies, and public opinion surveys, we also examine certain research findings related to confidentiality and data access. We examine in Chapter 4 the legitimate expectations of data users, within and outside government, for access to federal statistical data. We also explore the ethical responsibilities of data users and advocate establishing their legal responsibilities in agency or systemwide statutes. In Chapter 5 we review legislation governing confidentiality and data access, especially the Privacy Act of 1974, Title 13 of the U.S. Code, the Hawkins-Stafford amendments of 1988 for the National Center for Education Statistics, and the Public Health Services Act as it affects the National Center for Health Statistics. While recognizing a basis for diversity according to agency mission, we emphasize the value of all statistical agencies having a certain minimal standard of statutory authority to protect their data. The experience of the Energy Information Administration and the Bureau of Labor Statistics suggests that some agencies would benefit from having more comprehensive statutory protection of their statistical records. Extensive dissemination of detailed information is necessary to ensure that ample value can be obtained from federal censuses and surveys. At the same time, statistical agencies must fulfill pledges of confidentiality to data providers. Thus, we examine in Chapter 6 technical and administrative procedures for providing information while ensuring that the risk of disclosure is at most minimal. In Chapter 7 we address confidentiality issues associated with statistical data on organizations. Using four case studies, we develop

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics a conceptual basis for similarities and differences in the treatment of data on organizations compared with data on individuals and households. We address the management of confidentiality and data access functions in Chapter 8, with particular attention to interagency coordination and the cross-national experience. We also explore issues of agency staffing and data protection legislation. Our findings and recommendations are presented in Chapters 3 through 8. A complete list of recommendations are the last chapter, and our study procedure is described in Appendix A.7 Biographical sketches of the panel members are provided in Appendix B. Throughout our deliberations we have been mindful that regardless of the efforts put forth, the tension between data protection and data access will not go away. At best one can hope for a temporary consensus each time the community of interested parties revisits this issue. Ideally, as with isometric exercise, achieving correct dynamic tension in one round builds greater strength for the next round. NOTES 1.   The Environmental Protection Agency, for example, in collecting data on compliance with air pollution regulations for the purpose of flagging offenders is not functioning as a statistical agency. 2.   See Marx (1988:219-229, 1990) for general arguments about the value of privacy and anonymity. 3.   This concern is as valid for data on organizations as it is for data on individuals or households. 4.   To illustrate the lack of private financial incentives, studies on the health risks of smoking draw some private support from the insurance industry, but that industry's financial incentives to develop data may pale compared with those of the tobacco industry. With a primary mandate to serve the public interest, the National Center for Health Statistics, along with the National Institutes of Health, can generate the vital data that help inform the debate. As Lave (1990:33-34) notes, Strongly held opinions are rarely sufficient to improve public health. For every public health professional who favoured antismoking and AIDS communication campaigns, there were several people opposed to these campaigns. The strength of belief of the surgeon-general, and the mustering of support of public health professionals were necessary, but not sufficient, conditions for being able to conduct a campaign. The campaigns

OCR for page 15
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics     could not have been mounted without mustering the data to show that the problems were important, and that the proposed actions were likely to improve public health. 5.   For example, Allison and Cooper (1991) note a case in which Institutional Brokers Estimate System (IBES) filed suit against a researcher who criticized their data and imposed conditions on academic researchers that (1) require them to clear all potentioal publications with IBES so the latter can have the opportunity "to identify factual errors or misunderstandings" and (2) require researchers using IBES data to refrain from providing access to others (including research assistants) without prior clearance. 6.   See, for example, Bureau of the Census (1982), Louis Harris and Associates, Inc. (1981, 1983), National Research Council (1979), the Roper Organization, Inc. (1980). 7.   To facilitate its work, the panel commissioned several background papers on issues bearing on confidentiality and data issues. The papers papers appear in a special issue of the Journal of Official Statistics, 1993(2). See Appendix A for a list of the papers.