Principles and Problems
Privacy is the most comprehensive of all rights and the right most cherished by citizens of a free nation.
Justice Louis Brandeis Olmstead v. United States, 1928
A people who mean to be their own Governors, must arm themselves with the power to which knowledge gives.
James Madison, 1822
THE TENSION BETWEEN PRIVATE LIVES AND PUBLIC POLICIES
Private lives are requisite for a free society. To an extent unparalleled in the nation's history, however, private lives are being encroached on by organizations seeking and disseminating information. In their stewardship of data collection and data dissemination, federal statistical agencies have had a long-standing concern for the privacy rights of their data providers, but they now face mounting demands for privacy in the wake of such external developments as telemarketing through random digit dialing and computerized capture of data on everyday activities, like supermarket purchases by credit card.
In a free society, public policies come about through the actions of the people. Those public policies influence individual lives at every stage—financing of prenatal care, state aid to school districts, job training and placement, law enforcement, and determining retirement benefits. Data provided by federal statistical agencies, such as the Bureau of Justice Statistics, the Bureau of Labor Statistics, the National Center for Education Statistics, and the National Center for Health Statistics, are the factual base needed for informed public discussion about the direction and implementation of those policies. Further, public policies encompass
not only government programs but all those activities that influence the general welfare, whether initiated by government, business, labor, or not-for-profit organizations. Thus, the effective functioning of a free society requires broad dissemination of statistical information.
Juxtaposing the Brandeis and Madison quotations above reveals a basic tension in the stewardship of federal statistical agencies. Such agencies seek, on the one hand, to ensure private lives for the citizenry. On the other hand, they seek to provide the data on which public policies are based. Yet, because of concerns about data confidentiality, there is a large unmet need for greater access to data. Data users, whether in federal agencies (including policy research units), state or local governments, academe, trade associations, businesses, market research organizations, political interest groups, or the media, have persistently asked for increased access to data.
How can federal statistical agencies serve data users better by providing more access to useful data and at the same time serve data providers by better ensuring privacy and confidentiality? What principles must guide their actions? What are the key problems? Because facts are the lifeblood of a free society, answers to these questions concern many people—not just government statisticians. In particular, they are of concern to data users and data providers.
Answers to fundamental questions of confidentiality and data access affect many data users. Included in this category are researchers (e.g., university researchers investigating the affordability of housing), policy analysts (e.g., staff of an educational policy group assessing the influence of federal student loans on degree completion rates), and legislators and congressional staff (e.g., members of the Joint Committee on Taxation who are modeling the effects of proposed changes in the tax code).
Answers to questions of confidentiality and data access also affect many data providers. Included in this category are individuals (e.g., those in households throughout the country who respond to decennial censuses), institutions (e.g., nursing homes that provide data to the National Center for Health Statistics), and enterprises (e.g., car manufacturers that respond to the Energy Information Administration's Manufacturing Energy Consumption Survey).
In addition, many other individuals and groups are concerned with data use and data supply. Included in this category are legislators (e.g., members of the House Committee on Government Operations); lawyers (e.g., attorneys trying to build a class action
case for Medicaid clients turned out of nursing homes or representing a client whose research files on a project funded by the National Institutes of health have been subpoenaed); journalists (e.g., a reporter writing about freedom of information or privacy issues involving AIDS); and business executives, who simultaneously are asked for data on their operations and are seeking information on their business environment.
Finally, advocacy groups represent various positions on these issues. Groups such as Computer Professionals for Social Responsibility are concerned with computer-related privacy issues. And professional organizations such as the Association for Public Data Users are concerned with issues like access to health data on minority populations.
STUDY GOALS AND SCOPE
The federal statistical system is complex and far reaching. It encompasses more than 70 federal agencies having a role in collecting data from individuals, households, farms, businesses, and governmental bodies and disseminating data for statistical purposes. Those statistical purposes include description, evaluation, and research. Although it may collect data from individual respondents, administrative records, or organizations, a statistical agency does not do so in order to take administrative, regulatory, or enforcement action toward a particular individual or organization.1 Indeed, it is not concerned with identifying the respondent with particular information, but rather with describing and inferring patterns, trends, and relationships for groups of respondents (National Research Council, 1992b).
The Panel on Confidentiality and Data Access was charged by the Committee on National Statistics and the Social Science Research Council with developing recommendations that could aid federal statistical agencies in their stewardship of data for policy decisions and research. Three areas were of paramount concern in our deliberations: protecting the interests of data subjects through procedures that ensure privacy and confidentiality, enhancing public confidence in the integrity of statistical and research data, and facilitating the responsible dissemination of data to users.
Deciding on the exact scope of our investigation was not easy, for the boundaries of federal statistical activities are not clearly defined. Federal statistical activities include the development and dissemination of large, general-purpose data sets based on censuses, surveys, and administrative records. They also include
the collection and analysis of personal data in experimental research with human subjects. A few federal statistical agencies conduct general or multipurpose programs, but many others conduct specialized statistical programs or activities. In addition, some basically programmatic agencies, like the Federal Aviation Administration, the Health Care Financing Administration, and the Internal Revenue Service, conduct some statistical activities. Moreover, many federally supported statistical activities are carried out by contractors and grantees, and some statistical activities depend heavily on federal-state cooperative arrangements of long-standing. Finally, the data subjects and units of analysis for statistical programs include persons and organizations, but when the concepts of privacy and confidentiality are applied to organizations, they have quite different meanings than they do when applied to persons.
To make our task manageable, we decided to concentrate our attention on major federal statistical programs and to look beyond them only to the extent that seemed necessary to provide adequate coverage of confidentiality and data access questions related to those programs. In addition, given the basic nature of the issues and our backgrounds and experience (see Appendix B), we were more comfortable dealing with our charge as it applied to personal, rather than organizational, data. Nevertheless, input from statistical agencies and events that occurred during our deliberations made it clear that there are major issues relating to data for organizations that cannot be ignored. Accordingly, we lay out a framework for the treatment of data on organizations in Chapter 7. We also refer to organizations in other chapters, especially Chapter 5, which deals with legislation. We believe that confidentiality and data access questions for organizations warrant more attention than they have received in the past and more than we have been able to give. We hope that the federal statistical agencies and others will pursue them through systematic studies.
Given the complexity of the federal statistical system, designing an ideal configuration to address confidentiality and data access issues throughout the system, or even in any one federal statistical agency, is too daunting for this panel. Instead, we seek to contribute to a long tradition in the statistical community of periodically reconsidering current institutional structure and practice. Fundamentally, we seek to spur this ongoing process by articulating and applying three tenets of an ethic of information in a free society: democratic accountability, constitutional empowerment, and individual autonomy. These tenets are consistent with the
ethos of American society and are the basis for trustworthy operation of federal statistical agencies. We believe that this attention to underlying principle can have a more beneficial and lasting impact than would an attempt to provide detailed recommendations for micromanagement of agency confidentiality and data access procedures.
Although we recognize the inherent tension between data protection and data access, we do not advocate a specific trade-off between the two. The dynamics of such a trade-off would be complicated and heavily influenced by the missions and operational environments of individual agencies, so that a single solution would not work for every agency. Nevertheless, we see some opportunities to enhance data access without decreasing data protection, and some opportunities to increase data protection without diminishing data access. Accordingly, society faces win/no loss situations, arguably because current institutional arrangements are inadequate. To develop this theme and inductively build a better understanding of the broad principles, we use as illustrations particular circumstances that individual agencies currently confront.
Many topics that fall under the rubric of societal concerns for effective, efficient, and ethical collection and use of information are largely outside the scope of this report. We are mindful of the burgeoning information industry in the private sector, including marketing firms and consumer credit bureaus. While we have concerns about this industry's impact on individual privacy, those concerns are outside the scope of this report. Nor do we address the paperwork and privacy burden imposed by government in administering its programs.
Directly within the scope of this report is the functioning of the federal statistical system as it grapples with confidentiality and data access issues. Specifically, we have a concern for how a federal statistical agency relates to its three diverse and overlapping constituencies: data providers, government itself, and data users.
We direct this report to all concerned with confidentiality and data access issues. We hope that the ideas and recommendations we put forth will stimulate consideration of these critical issues by all concerned and make each interested party more aware of the legitimate concerns of the other parties. Further, we hope that this report will stimulate discussion of confidentiality and data access issues across levels of government and across geographical boundaries.
HOW DOES THE FEDERAL STATISTICAL SYSTEM FUNCTION?
The federal statistical system is a largely decentralized agglomeration of agencies. Each agency functions under the guidance of officials in a government department, as the Bureau of the Census does in the Department of Commerce and the National Center for Education Statistics does in the Department of Education. The agencies operate with various legal authorizations to compile, analyze, and disseminate statistical data. According to the Office of Federal Statistical Policy and Standards (1978:413),
The ultimate purpose of all Federal statistical collection activities is to develop statistical material for use by Government agencies, policymakers, and the public. The ultimate test of the Federal Statistical System is the availability of relevant information; therefore, the question of data access must continually receive a high level of attention.
Identifying seven of the larger federal statistical agencies illustrates the reach of the system's activities:
Bureau of the Census
Bureau of Justice Statistics
Bureau of Labor Statistics
Energy Information Administration
National Agricultural Statistics Service
National Center for Education Statistics
National Center for Health Statistics
Some of these agencies have purposes that are not purely statistical. The Energy Information Administration (EIA), for instance, is in some cases required to provide identifiable data in support of regulatory and program needs of the Department of Energy.
Each of the existing federal statistical agencies was founded in response to needs for data bearing on critical areas of public policy. A recent addition is the Bureau of Transportation Statistics, which was established by Congress in 1991 to compile and analyze data related to such issues as the environmental impact of mass transit, the factors affecting choice of transportation mode, and the nature of vehicular accidents. Congressional support has been voiced for the establishment of specialized statistical agencies in other crucial policy areas, as well.
In collecting data, as noted, the agencies of the federal statistical system obtain data from individual respondents (e.g., the U.S. Fish and Wildlife Service's Survey of Fishing, Hunting, and
Wildlife Related Recreation elicits data from outdoor enthusiasts on their recreational activities), households (e.g., the Census Bureau conducts the Consumer Expenditures Survey for the Bureau of Labor Statistics), organizations (e.g., the National Agricultural Statistics Service collects information from farm operators on some 120 crops and 45 varieties of livestock), and governmental sub-units (e.g., the Census Bureau collects data from state and local governments).
In disseminating data, federal statistical agencies provide information to a range of clients. Within the government, clients include researchers within the same department (e.g., the National Agricultural Statistics Service provides data to the Economic Research Service in the Department of Agriculture), other federal agencies (e.g., the Internal Revenue Service's Statistics of Income Division furnishes data to the Bureau of Economic Analysis in the Department of Commerce), state and local governments (e.g., the National Center for Education Statistics supplies data to the Commonwealth of Pennsylvania's Department of Education), business firms (e.g., the National Center for Health Statistics makes data available to the Kaiser Permanente Health Maintenance Organization), Congress (e.g., the Bureau of Economic Analysis calculates the gross domestic product, which informs the Congressional Budget Office), and the Executive Office of the President (e.g., the Census Bureau gives information on job seeking by the unemployed to the Council of Economic Advisers). Outside the government, clients include the media (e.g., the New York Times, September 23, 1990:Section 3, p. 11, reported on investment in farms using Department of Agriculture figures), individual members of the public (e.g., the 1990 World Almanac, which sold over 54 million copies, is replete with government statistics), and academic researchers (e.g., Babcock and Engberg, 1990, use Bureau of Labor Statistics data as a reference point for their survey of local labor market conditions).
With limited authority and resources, the Office of Management and Budget, through its Statistical Policy Office, provides long-range planning for statistical programs and coordinates statistical policy within the federal government. The Statistical Policy Office does not have direct administrative responsibility for federal statistical agencies, however.
A lack of general agreement on terminology has caused some confusion in discussions of the issues involved in data protection and data access. The panel found the following definitions for certain key terms to be helpful. They are used consistently in this report.
DATA SUBJECTS AND DATA PROVIDERS
Data subjects are persons, households, or organizations for which data are obtained and presented in statistical form. The data subjects are not always the data providers, however. One person in a household may respond to a survey questionnaire that asks for information about all members of that household. Data providers are often called respondents. When respondents provide information for data subjects other than themselves, they are called proxy respondents.
Privacy has multiple definitions depending on what aspect of this broad concept is being stressed. We take the following as our working definition:
Informational privacy encompasses an individual's freedom from excessive intrusion in the quest for information and an individual's ability to choose the extent and circumstances under which his or her beliefs, behaviors, opinions, and attitudes will be shared with or withheld from others.
CONFIDENTIALITY AND DATA PROTECTION
Confidentiality refers broadly to a quality or condition accorded to information as an obligation not to transmit that information to an unauthorized party (National Research Council, 1991:289). This has implications that range over religious confessionals, national security, private business ''whistleblowing," and disclosures of crimes. Our concern is, more narrowly, with the promises, explicit or implicit, made to a data provider by a data gatherer regarding the extent to which the data provided will allow others to gain specific information about the data provider or data subject. Confidentiality has meaning only when the promises made to a data provider can be delivered, that is, the data
gatherer must have the will, technical ability, and moral and legal authority to protect the data.
Our definition of confidential data is consistent with the position of the President's Commission on Federal Statistics (1971:222):
[Confidential should mean that dissemination] of data in a manner that would allow public identification of the respondent or would in any way be harmful to him is prohibited and that the data are immune from legal process.
Data protection refers to the set of privacy-motivated policies and procedures that ensure minimal intrusion by data collection and maintenance of data confidentiality. The term is generally used in the context of protecting personal information (Flaherty, 1989). Unlike privacy, however, which is an individual right, confidentiality is not restricted to data on individuals and is often extended to data on organizations.
INFORMED CONSENT AND NOTIFICATION
Informed consent and notification are related, but distinct, ethical and legal concepts. From our perspective, informed consent refers to a person's agreement to allow personal data to be provided for research and statistical purposes. Agreement is based on full exposure of the facts the person needs to make the decision intelligently, including any risks involved and alternatives to providing the data (derived from Black et al., 1990:779). Informed consent describes a condition appropriate only when data providers have a clear choice. They must not be, nor perceive themselves to be, subject to penalties for failure to provide the data sought.
Notification also involves a condition of data provision under full exposure of pertinent facts. Unlike with informed consent, however, the elements of choice and agreement are absent. Notification is the more appropriate concept when data provision for stated purposes is mandatory, as it is in the decennial census of population.
Disclosure relates to inappropriate attribution of information to a data subject, whether an individual or an organization. Disclosure occurs when a data subject is identified from a released file (identity disclosure), sensitive information about a data subject
is revealed through the released file (attribute disclosure), or the released data make it possible to determine the value of some characteristic of an individual more accurately than otherwise would have been possible (inferential disclosure).
Inferential disclosure is the most general of the types of disclosure. The definition above was suggested by Dalenius (1977) and supported by Statistical Policy Working Paper 2 (Federal Committee on Statistical Methodology, 1978:41). Underlying that support was the belief that it offers the best basis for (1) identifying all potential disclosures in connection with proposed releases, (2) deciding which of the potential disclosures are unacceptable, and (3) using appropriate techniques to prevent unacceptable disclosures.
ADMINISTRATIVE AND STATISTICAL DATA
One purpose of data collection concerns a course of action that affects a particular person or business. The purpose can be regulatory, administrative, legislative, or judicial. Examples include tax audits of a person, couple, or corporation; a criminal investigation into a report of arson; license renewal for a liquor store; and determination of welfare benefits. We refer to these purposes generically as administrative.
Another purpose of collecting data is to generate an aggregate description of a group of persons or businesses. No direct action is taken for or against a specific individual or business, although as a result of the information, policy changes based on such information could result in benefits or costs to persons or businesses. Examples include development of a formula for determining which tax returns should be audited, investigating geographical patterns of arson in a large city, and researching the relationship between the incidence of liquor law violations by stores and store characteristics or how the duration of welfare benefits varies with the educational level of the recipient. We refer to these purposes generically as statistical. Consistent with the distinction between administrative and statistical data, the Privacy Act of 1974 (P.L. 93–579) defines a statistical record to be
a record in a system of records maintained for statistical research or reporting purposes only and not used in whole or in part in making any determination about an identifiable individual, except as provided by Section 8 [which authorizes certain kinds of data access, including for research activities by the Bureau of the Census] of Title 13.
WHAT PRINCIPLES SHOULD GUIDE STATISTICAL AGENCIES?
As creations of government, statistical agencies mirror, with varying shadings, the ethos of their society. For the United States, and a growing list of other countries, this ethos embraces a freedom that recognizes pluralism, public decision making based on representative democracy, and a market-oriented economy. Statistical agencies reflect those themes through a remarkably intricate configuration of institutional structure and practice.
As noted above, the principles of democratic accountability, constitutional empowerment, and individual autonomy maintain the ethos of American society and provide valuable ethical guidance for the structure and practice of federal statistical agencies. Recognizing that the guidance they provide is often reinforcing but is not always harmonious, we examine each principle in turn. When we apply the principles in subsequent chapters, we either explore ways that they can be reconciled or note that difficult trade-offs must be made.
Functionally, accountability recognizes the responsibilities of those who serve on behalf of others. With position and involvement in manifold areas of life, government in a democracy should serve the public—collectively, individually, and in assorted assemblages—and be accountable to it. As John Locke (1690/1988:426–427) said in his Two Treatises of Government,
Who shall be Judge whether the Prince or Legislative act contrary to their Trust … The People shall be Judge; for who shall be Judge whether his Trustee or Deputy acts well, and according to the Trust reposed in him, but he who deputes him.
In implementation, accountability requires that the public obtain comprehensive information on the effectiveness of government policies. Prewitt (1985) addresses this relationship between government and citizens as "democratic accountability." Federal statistical agencies play a pivotal role in ensuring democratic accountability—they obtain, protect, and disseminate the data that allow accurate assessment of the influence of public policies on individual well-being. Further, they themselves are accountable to the public for two key functions in this process: (1) protecting the interests of data subjects through procedures that ensure appropriate
standards of privacy and confidentiality and (2) facilitating the responsible dissemination of data to users, including those to whom democratic government is accountable—the country's citizens.
Constitutional empowerment, as a cornerstone of an information ethic, is the capability of citizens to make informed decisions about political, economic, and social questions. In the United States, constitutional theory emphasizes that ultimate power resides in the people. For reasons of advancing the common welfare, certain specific powers are delegated to a representative government. However, "the powers not delegated to the United States by the Constitution, nor prohibited by it to the states, are reserved to the states respectively, or to the people" (Amendment X, U.S. Constitution). Constitutional practice emphasizes restraints on executive excess and broad access to the political process through the direct election of representatives, as well as through separation and balance of power.
Knowledge is a prerequisite to the ethical exercise of power. As Supreme Court Justice Felix Frankfurter (1930:127) observed,
We now realize that democracy is not remotely an automatic device for good government…. We now know that it is dependent on knowledge and wisdom beyond all other forms of government…. [Democracy] seeks to prevail when the complexities of life make a demand upon knowledge and understanding never made before.
The principle of constitutional empowerment is increasingly important today as federal statistical agencies struggle to obtain and protect data needed for a factual understanding of a fast-changing, complex world. "Electoral power per se is the mechanical guarantee of the system, but the substantive guarantee is given by the conditions under which the citizen gets the information" (Sartori, 1962/1972:74).
As the Office of Technology Assessment (1989:1) notes in Statistical Needs for a Changing U.S. Economy,
Good public policy demands good information. There may be disagreement about the wisdom of different Federal programs but there is little dispute over the need for adequate data to inform the debate. The information generated by the $2 billion spent this year by Federal agencies on statistical programs is a key
resource for government policymakers as well as for private investors, public interest groups, academic researchers, and labor organizations…. Government statistics play a key role in evaluating and implementing legislation and are often used as indexes in private contracts.
Individually, empowerment of the citizenry through access to data can heighten fairness. Moreover, it can lessen inefficiencies ascribable to imbalances in information. A sugar beet farmer, for example, without knowledge of current market prices is vulnerable when negotiating a contract with a knowledgeable sugar beet processor. Collectively, empowerment through access to data can create benefits to parties acting cooperatively. In salary negotiations between a school board and teachers union, for example, both sides might turn to a common, legitimated base of statistical information about population and property tax assessment trends to help resolve negotiations. In order to function properly, both a political democracy and a market economy require that voters and consumers make informed choices.
As part of their basic mandate, federal statistical agencies are instructed to provide data that can be used to evaluate the efficiency and effectiveness of government programs. In 1967, Congress affirmed the principles of constitutional empowerment and democratic accountability by passing the Freedom of Information Act to serve democratic values by (1) creating a more fully informed public debate on important issues and (2) counteracting political corruption.
Individual autonomy refers to the capacity of the individual to function in society as an individual, uncoerced and cloaked by privacy. Protection of individual autonomy is a fundamental attribute of a democracy. Individual autonomy is compromised by the excessive surveillance sometimes used to build data bases (Flaherty, 1989), unwitting dispersion of data, and a willingness by those who collect the data for administrative purposes to make them readily available in personally identifiable form. Illustrating this latter issue is the following, reported by Desky (1991:1):
For five cents a name, the Maryland Motor Vehicles Administration (MVA) will sell the names of 3.3 million licensed drivers to vendors who request them, with a minimum nonrefundable deposit of $500. All information on a driver's license including
height, weight, gender, address and driving record from the previous three years is being sold by the state.
While some may argue that governmental bodies appropriately use administrative records as a source of revenue, most federal statistical agencies are acutely aware of the importance of maintaining respect for an individual's autonomy. The National Center for Health Statistics, for example, has legislative authority to protect its records under the Public Health Service Act (42 U.S.C. 242m), which provides that
no information obtained in the course of activities undertaken or supported under Section 304, 305, 306, or 307 (the sections authorizing the programs of NCHS and of the National Center for Health Services Research) may be used for any purpose other than the purpose for which it was supplied unless authorized under regulations of the Secretary; and (1) in the case of information obtained in the course of health statistical activities under Section 304 or 306 (which authorize the program of NCHS), such information may not be published or released in other form if the particular establishment or person supplying the information or described in it is identifiable unless such establishment or person has consented (as determined under regulations of the Secretary) to its publication or release in other form.
Federal statistical agencies have ethical and pragmatic reasons to be concerned about individual autonomy.2 From an ethical standpoint,
agencies are obligated by the imperatives of our society to respect individual dignity,
agencies should protect the personal information that has been entrusted to them, and
intrusive data collection by agencies can disturb an individual's chosen solitude.
From a pragmatic standpoint, most persons who are surveyed are not required to provide data (the decennial census is the main exception). Thus, if statistical agencies are perceived as failing to respect individual autonomy, the supply of data providers may dry up. Even in the case of legally compelled participation, without respect for individual autonomy, evasion and inaccurate reporting are likely to be rampant.3 Martin (1974:265) stated this well:
Even when responses to requests for information are required by law, the success of a statistical program depends in large measure
on the willing cooperation of respondents. Respondents who understand the purposes of the inquiry, who sympathize with the intended uses of the information, and who believe that providing the government with the requested information will not harm them are much more likely to answer truthfully and with a minimum of effort on the part of the data collection agency.
One element in enlisting such cooperation is the assurance of harmlessness to the respondent, and one of the most common methods of making such assurance in statistical data collection is the provision for keeping the replies confidential.
Agencies must have the trust of data providers.
THE SPECIAL ROLE OF FEDERAL STATISTICAL AGENCIES
Federal statistical agencies have vitally served the government's information function and rightly should serve it in the future. As far back as 1888, for example, the U.S. Bureau of Labor was providing statistical information on the conditions of working women. The bureau reported that 17,500 women averaged $5.24 in weekly earnings, and it drew the policy implication that "the figures tell a sad story, and one is forced to ask how women can live on such earnings" (U.S. Bureau of Labor, 1889:70). Today, the Bureau of Labor Statistics provides such policy-relevant information on earnings and employment without regard to its political consequences.
In this section we examine, first, four reasons why government should be involved with gathering statistical information and then four reasons why federal statistical agencies are more appropriate than federal administrative agencies or private agencies to fulfill this function.
The government collects data because, first, it has an obligation to inform the public on those matters that affect the welfare of the people individually and collectively. This includes facilitating democratic accountability by providing sufficient information on government activities to ensure public knowledge of the government's performance. It further includes providing the empowering information that enables a citizen to have an impact on public policies.
Second, in some cases private financial incentives are insufficient to motivate the collection of data that are essential to a democratic society. Most large, multipurpose national data bases, such as the decennial census and the national income accounts, cost far more to collect than any private firm or group of private
organizations could ever recover from the market should they decide to fund such an enterprise. Thus, if the efforts are not publicly funded, such data bases would not exist. The social value of the data far exceeds their private value.4
Third, economies of scale are often so large that even though it might be possible to organize multiple sets of private data collectors to do the job, a single public collection would cost significantly less, especially in the case of regularly repeated collections. Further, economies of scale suggest that a large-scale program of data collection has synergies that cannot be obtained by a piecemeal approach. For example, cognitive studies of how respondents interpret various wordings of survey questionnaires can be supported. Such studies may be informative for a range of current and future surveys.
Fourth, private information providers have a natural interest in protecting their investment, which may limit the spread of information.5 On the other hand, government can disseminate data at cost and ensure that the information is accessible for the public good.
For a variety of reasons, the alternatives to federal statistical agencies, whether private information organizations or federal administrative agencies (which do have an essential role in collecting data for regulatory enforcement), cannot be expected to provide the appropriate data. First, government administrative agencies cannot be expected to fulfill completely the function of providing the information needed for democratic functioning. Administrative agencies have primarily operational tasks, whether militarily defending the country, putting criminals in jail, collecting taxes, or running an air traffic control system. The administrative data such agencies collect can be of general value to society, however.
Second, among most data users, federal statistical agencies have established a reputation for integrity and independence. While certain private sector survey firms and information organizations have fine reputations among knowledgeable statisticians, their reputation does not have the same breadth among the relevant public. Often, private data collection is carried out by interested parties, who are unlikely to be objective, and if they are, the perception of possible bias reduces the value of the data.
Third, some data are collected because they have direct value in implementing government policy. This is true, for example, of the decennial census in allocating seats in the House of Representatives among the states, of the Consumer Price Index in determining Social Security benefits, and of the National Center for
Health Statistics infant mortality data in shaping programmatic effort toward prevention. When government policy is directly affected by certain data, the government must assume responsibility for its quality. This suggests that the government must have substantial control over the design and implementation of the data collection and primary analysis.
Finally, federal statistical agencies have historically done the job, and done it well. Because of the way data are produced, much useful and empowering information is not fully provided by the workings of the private market. Data on educational achievements of elementary school students, for example, although often collected through privately developed test instruments, have historically been collected and disseminated through governmental mandate. Federal statistical agencies, such as in this case the National Center for Education Statistics, play a key role in coordinating data gathering, maintaining quality standards, and disseminating information.
DATA ACCESS IN A DEMOCRATIC SOCIETY
Recognized information needs in society are so great and budgets so constraining that government analysts can do only a small fraction of the research that can beneficially be done with federally collected data. Most of the research must be carried out by analysts for various concerned organizations and academic researchers. Moreover, to enhance the integrity of research findings, independent analysts should have access to data, regardless of the organization that collected it. As a critical element of the democratic process, this access can allow reanalysis by groups with different agendas; stimulate new inquiries on important social, economic, and scientific questions; lead to improvements in the quality of data through suggestions for better measurement and data collection methods; and provide information to improve government forecasts and resource allocations (see National Research Council, 1985). As noted, such open access is a necessary condition for a society to function freely and effectively. The ideal result is a robust, resilient society in which individual and collective interests are served through a competition for the truth. In contrast, monopolized access by "a central state planning board" suppresses freedom and hampers efficiency.
The panel maintains that government dissemination of statistical data under appropriate confidentiality constraints is a public good. Failure to provide data may result in substantial lost opportunities.
Accountability in a democracy is threatened by restricting the collection of government statistics to only those sought by government policymakers and by restricting access to government statistics to only government policymakers. Wallman (1988:11) makes this point and notes a remark attributed to Christopher DeMuth, the Office of Management and Budget (OMB) administrator for information and regulatory affairs, by Ann Crittenden (New York Times, July 11, 1982, Business section:4):
In the past, agencies collected much greater detail than was needed for national policymaking purposes. It is understood now that agencies justify their data collecting programs to OMB in terms of the needs of the federal agencies alone, not of states, local governments, or private firms.
Wallman further notes,
In 1985 OMB distributed for comments a draft circular (A–130) that provided that ''executive branch agencies are to collect only that information necessary for the proper performance of agency functions and that has practical utility."
After some unfavorable comment, the proposed circular was withdrawn; it was only reissued in July 1993 with the publication of a revised OMB Circular A-130 in the Federal Register (58(126):36068–36086). This revised circular is based on quite different principles than the 1985 draft circular. Leading the 1993 circular's Section 7, "Basic Considerations and Assumptions," are the following points:
The Federal Government is the largest single producer, collector, consumer, and disseminator of information in the United States. Because of the extent of the government's information activities, and the dependence of those activities upon public cooperation, the management of Federal information resources is an issue of continuing importance to all Federal agencies, State and local governments, and the public.
Government information is a valuable national resource. It provides the public with knowledge of the government, society, and economy—past, present, and future. It is a means to ensure the accountability of government, to manage the government's operations, to maintain the healthy performance of the economy, and is itself a commodity in the marketplace.
The free flow of information between the government and the public is essential to a democratic society. It is also essential that the government minimize the Federal paperwork burden on the public, minimize the cost of its information activities, and maximize the usefulness of government information (p. 36071)….
The panel agrees with the thrust of these points in the 1993 OMB circular.
Unquestionably, there can be no lively democratic policymaking—and so there can be no constitutional empowerment—unless many individuals and interest groups have access to information. According to Dahl (1982:11), one of the most important characteristics distinguishing modern democracies is that "citizens have a right to seek out alternative sources of information. Moreover, alternative sources of information exist and are protected by law." Smith (1991:7) also notes that "limited public access to data not only gives intramural researchers a monopoly on the data, it also provides federal agencies an effective mechanism to control areas of sensitivity."
The panel, however, should not be misunderstood to be advocating unrestricted access to personal data. To the contrary, we affirm the ethical imperative of individual autonomy, which requires appropriate guarantees on privacy and confidentiality. Also, we recognize that useful data are more likely to be provided by individuals and establishments under suitable guarantees of confidentiality.
PROBLEMS IN ENSURING CONFIDENTIALITY AND DATA ACCESS
Federal statistical agencies confront a challenging environment—apprehensive respondents, exasperated researchers, skeptical funders, and pressures for administrative uses of confidential statistical records. Not surprisingly in a decentralized statistical system, some of these problems are specific to particular agencies and thus are outside the scope of this report. Nonetheless, a number of problems cut across agencies. Below, we identify a number of general problems and point to chapters of the report where they are considered in depth.
DO STATISTICAL AGENCIES HAVE ADEQUATE AUTHORITY TO PROTECT DATA?
At times, statistical agencies and respondents to statistical surveys have been under pressure to disclose data in identifiable form. For example, a 1961 court order required the St. Regis Paper Company to deliver to the Federal Trade Commission its file copy of a completed Census Bureau form. "In a swift reaction, Congress amended the Census law to protect copies of Census
documents retained in respondent's files from compulsory legal process" (Office of Federal Statistical Policy and Standards, 1978:256). As another example, the statistical arm of the Department of Energy, the Energy Information Administration, collects proprietary information on pricing and production from petroleum companies. In 1990, the Department of Justice's Antitrust Division requested individually identifiable data from the agency to investigate alleged price gouging by oil companies in the aftermath of the Iraqi invasion of Kuwait. The agency refused, citing its policy and pledge to keep the data, which had been collected for statistical purposes, confidential. In the ensuing disputations between the agency and the Department of Justice, it became evident that the agency lacked unambiguous legal authority to sustain its confidentiality pledge. We explore this case in detail in Chapter 7.
In discussing data protection, we emphasize the fundamental distinction between administrative data and statistical data. Important differences exist among the types of data collected by the government, especially data collected for statistical and research purposes versus data collected for the administration of government programs. Administrative data often have inherent research value, and statistical uses are appropriate, provided confidentiality safeguards can be maintained. Data from Medicare and Medicaid records, for example, are properly used in studying the pattern of medical procedures, such as coronary angioplasty, as they vary by region of the country, race, and gender. Although in certain situations access to statistical data may seem administratively convenient, most administrative uses would violate pledges of confidentiality. The simple statement "Your answers are confidential" on the cover of a census form ought to mean, for example, that information provided on household composition will not be used to check eligibility for Aid to Families with Dependent Children. As we have noted, federal statistical agencies have experienced pressure to provide data for administrative purposes. Withstanding such pressure can be especially difficult for federal statistical agencies (or programs with statistical functions) that are housed in units with important regulatory functions. (This point was emphasized by the Office of Federal Statistical Policy and Standards, 1978:258.)
The central concept of providing legislative and procedural protection to ensure that personal data collected about a data subject for a research or statistical purpose are not used for an administrative or other decision about that data subject is generally
called functional separation. It is not a new concept, having been recommended, for example, in a 1973 study by the Secretary's Advisory Committee on Automated Personal Data Systems (U.S. Department of Health, Education and Welfare). The concept was enunciated by the Privacy Protection Study Commission (1977a:574). Stating the concept as a recommendation for legislation by the U.S. Congress, the commission proposed
that the Congress provide by statute that no record or information contained therein collected or maintained for a research or statistical purpose under Federal authority or with Federal funds may be used in individually identifiable form to make any decision or take any action directly affecting the individual to whom the record pertains, except within the context of the research plan or protocol, or with the specific authorization of such individual.
We examine this concept throughout the report.
CAN COMMUNICATION WITH THE PUBLIC BE IMPROVED?
For federal statistical agencies to achieve full democratic accountability, they must be continuously cognizant of public perceptions regarding the central issues of data protection and data access. This may require systematic studies of public opinion, as has been done with various surveys of the public's general perception of the Census Bureau.6 Such studies could examine, for example, the extent to which the public continues to have a general distrust of centralized government records, a distrust of the kind that led to the Privacy Act of 1974. They could also assess the public's concern about the data collection and dissemination powers of the private sector information industry. And they could examine the extent to which various groups of the public distinguish between statistical agencies and administrative agencies regarding issues of data protection and data access.
Interpreting general survey results, however, is not easy; any general political alienation will drive down confidence and be reflected in responses about a specific area of government activity. More easily interpretable are systematic studies of the data provider and data user communities that interact with statistical agencies. A diversified portfolio of study techniques, including targeted surveys, data user and data provider conferences, reinterviews, focus groups, and small-scale experiments may be most effective. In the interests of accountability and openness, the results of such studies should be made publicly available.
How can federal statistical agencies best communicate to the general public, the data provider communities, and the data user communities the importance of the statistical information they can provide? How can they best instill confidence in these diverse groups regarding the three key aspects of their data protection policies: (1) their intention to minimize intrusions on privacy, (2) their intention to minimize the time and efforts of data providers, and (3) their ability to maintain confidentiality?
Given the principle of democratic accountability, how are the interests of the public best served? Institutionally, Congress addresses legislation that, for the most part, sets fairly general guidelines for agency policy and practice. At the more specific level of the development and implementation of agency policy regarding confidentiality and data access, mechanisms are beginning to emerge for ensuring input from representatives of the key affected groups. A variety of such mechanisms are possible, including data user conferences, meetings with privacy advocates regarding the conduct of key surveys, agency review boards with outside representation, and a government-wide Data and Access Protection Board.
ARE DATA PROVIDERS PROPERLY NOTIFIED OR INFORMED?
Most often, for data collected for administrative purposes, the data provider either is legally required to provide the data—as with tax return filings—or must provide the data in order to receive some benefit—as with driver's license applications. In broad terms, the ethics of what the agency should tell the data provider about the use of such administrative data are not controversial. The data provider should be notified about the need for such data and how providing the data might affect him or her—and that one potential use is for statistical purposes. A more complicated ethical question is what options data providers should have in denying various uses of the administrative data they provide.
On the other hand, with the notable exception of the decennial census, response to most statistical surveys of persons and households is voluntary. Law and ethics require that consent for participation in voluntary statistical surveys be informed. Potential respondents should be told how their data will be used and what the consequences will be to them of participating or not participating in the survey. They should be given the opportunity
to make a conscious decision whether to provide the data requested.
There are many open questions about the nature of informed consent procedures for voluntary surveys. Do the current practices of the statistical agencies conform fully with legal and ethical requirements? Is appropriate information being given to respondents in language that they can readily understand? Should the same amount of detail be given to all potential respondents? What are suitable procedures for use in telephone and mail surveys? Are the statistical agencies always able to honor the promise of confidentiality protection they give to respondents? To what extent should respondents be allowed to waive standard protections in order to permit data sharing for statistical purposes? We develop these issues in Chapter 3.
ARE CURRENT CONFIDENTIALITY AND DATA ACCESS LAWS ADEQUATE AND APPROPRIATE?
We discussed above the issue of whether statistical agencies have adequate authority to protect the data they collect. Here we introduce some other issues of legislation. The laws that govern confidentiality of and access to federal statistical data include some with general reach, like the Freedom of Information Act and the Privacy Act, but others vary widely from agency to agency. At one end of the spectrum, Title 13 of the U.S. Code provides extremely tight confidentiality provisions for the Census Bureau. At the other end, many agencies operate without any specific confidentiality legislation. Critics charge that certain provisions of current legislation not only fail to provide sufficient protection of confidential statistical data but create excessive barriers to data access. For example, the Office of Federal Statistical Policy and Standards (1978:262) noted obstacles to interagency data sharing and, in particular, the inability of agencies to gain access to the Census Bureau's Standard Statistical Establishment List for statistical sampling purposes. There are also obstacles to access by nongovernment data users. Comprehensive revision of the relevant laws risks a final product worse than the status quo. Further, legislation requires support if it is to be effective. Congress and the President must be committed to confidentiality protection and the public must understand its importance. On the other hand, the current approach of ad hoc legislative initiatives is not working very well either. Thus, in Chapter 5, we pose and address questions of variation among statistical agencies in the protection
offered identifiable statistical records, possible amendments of the 1974 Privacy Act, legislative language about "zero disclosure risk," and greater opportunities for data sharing among federal statistical agencies for statistical purposes.
HOW CAN NONGOVERNMENT USERS BE GIVEN ACCESS TO DATA WHILE PRESERVING CONFIDENTIALITY?
Despite efforts by several agencies to improve access to data by researchers and policy analysts outside the government, nongovernment users' need for detailed data is far from being fulfilled. Often, such data contain information on individual respondents over time, as it would for any study of how firms respond to environmental regulations, for example. Specifically, consider assessing the likely impact of a tax, levied in response to the threat of global warming, on the amount of carbon emitted from a plant. Combined information from EIA's Manufacturing Energy Consumption Survey and the Census Bureau's Longitudinal Research Database (see McGuckin and Pascoe, 1988) would show how plants would react to the implied change in the relative price of energy. There are many aspects to such a policy question and to answer it well researchers would have to address it from various perspectives. Weighed against this clear benefit, access poses serious data protection problems, and so the researchers most likely to answer the question may not get access to the data they need.
Statistical agencies have developed some procedures for providing greater access to data to selected researchers under conditions that subject the researchers to enforceable penalties for violations of confidentiality standards. For example, the designation of researchers as special sworn employees has been standard practice at the Census Bureau. This practice has clearly protected confidentiality in that special sworn employees have the same responsibilities as regular employees, but it has also limited the type and amount of research that can be done with the data. The access provided is of a temporary nature and must be carried out on site, under supervision, and only for purposes of the Census Bureau that are designated under Title 13 legislation. Further, it puts at a disadvantage researchers who are not currently in the Washington, D.C., area or cannot easily relocate.
Recently, the National Center for Education Statistics and the National Science Foundation have developed, and are evaluating, approaches for licensing researchers to have access to data for
statistical purposes, with penalties for improper use. Further, the Census Bureau has been examining for some time the possibility of establishing facilities for data access outside the Washington, D.C., area, perhaps at its regional offices or at a university under a data protection agreement. Most concretely, the Bureau of Justice Statistics has provided data to form the National Justice Data Archive at the Inter-University Consortium for Political and Social Research, which is located at the University of Michigan. This archive functions under the legal authority and confidentiality protection of the Code of Federal Regulations (§ 28, Pt. 22). The Bureau of Justice Statistics provides a project monitor to the archive.
CAN INDIVIDUAL-LEVEL DATA BE PROVIDED FOR PUBLIC USE?
Fritz Scheuren (1989:20), as director of the Statistics of Income Division of the Internal Revenue Service, described this predicament of the data collector:
Statistical Disclosure-Avoidance is an enormous problem. On the one hand, we want to make all the microdata [sets of individual records with identifiers removed] we produce publicly available so researchers can benefit fully; on the other hand, we have to protect respondents (or taxpayers, in my case) from having identifiable information inadvertently disclosed.
While some progress has been made since Scheuren's writing, the complexity of the task and the fundamental nature of the problem leave substantial work remaining. Statistical agencies make difficult decisions on how best to disseminate their products so that maximum value is obtained from them while protecting confidentiality. What kinds of data should be released with no restrictions? What kinds of statistical disclosure limitation techniques should be applied to data before releasing them for unrestricted public-use? Will researchers be led to incorrect inferences because of such techniques? What is a reasonably small risk of disclosure of individually identifiable data? How should such decisions be made, and by whom? We develop this topic in Chapter 6.
CAN LEGITIMATE NEEDS FOR DATA SHARING WITHIN GOVERNMENT BE MET?
Are young Americans entering the labor market prepared for the competitive challenge? Government statistics can help answer only part of this critical question. The National Center for Education Statistics, in its National Educational Longitudinal Study of 1988, started with 25,000 eighth graders and gathered periodic measurements on their academic performance, school and social environments, and family background. The Bureau of Labor Statistics surveys youth employment. The Bureau of Justice Statistics has some information on those in prison. As Hauser (1991:2) succinctly observes, however, "the overall effect of fragmented responsibility and piecemeal coverage is that, once youths leave high school, our statistical system treats them almost as if they had dropped off the face of the earth." Actually, some surveys by the National Center for Education Statistics do track students after high school. For example, the National Longitudinal Study of the Class of 1972 tracked students as they made the transition to their twenties. In general, however, there is a need for suitable interagency coordination in meeting data needs and, possibly, for some interagency data sharing.
Sharing of identifiable data for statistical purposes can have many potential benefits, including the enrichment of cross-sectional and longitudinal data sets, evaluation and improvement of the quality of census and survey data, improvement of the timeliness and consistency of statistical reporting, development of more complete sampling frames, and improvement of comparability between data developed by different statistical agencies. A significant amount of data sharing has occurred without incident, both between statistical agencies and from administrative agencies to statistical agencies. For many years, for example, the Census Bureau has used tax records on individuals and businesses to enhance its demographic censuses and surveys and to evaluate the quality of census and survey data. (Additional detail on this point is provided in Chapter 6.) Similarly, identifiable patient records are routinely used under controlled conditions in medical and epidemiologic research.
Decisions on whether to link records require careful examination of several factors. Will the linking be done under conditions that conform with all statutory confidentiality standards of the agencies involved and with pledges to data providers concerning the use of their information? Is record linkage the only feasible
way to develop the desired statistical products? Are the intended uses of these products sufficiently important to justify introducing additional risks of disclosure? Should agencies take into account potential reductions of (1) the time and effort of data providers and (2) the costs of obtaining the desired data by other means? If they should, how would they do it? Who will have access to the linked data sets? Who should decide?
The statutes, regulations, and policies that affect decisions to share data and participate in record linking projects vary widely among agencies. Statistical agencies, like the Census Bureau, and custodians of administrative records, like the Internal Revenue Service, operate under strict statutory controls on disclosure of identifiable data. Other agencies, like the National Agricultural Statistics Service, have the authority to deny access for nonstatistical purposes but also have more flexibility to share data for statistical purposes. Still other agencies operate primarily under general information statutes, like the Privacy Act of 1974, and have little difficulty in finding ways to participate in record linkages for statistical and research purposes, if they choose to do so.
The value of a survey data base can sometimes be substantially enhanced by adding data from administrative records for the persons in the data base (Juster, 1991). In carrying out a health and retirement survey, for example, researchers might like to use earnings records from Social Security files. Other valuable administrative records include case files from public assistance programs, health care claims, and tax returns. Such linkages are valuable for a number of reasons:
Existing administrative records may be more accurate than survey data obtained from respondents, especially for detailed information that must be recalled for earlier years, like income data.
Surveys are made less burdensome and intrusive because certain questions need not be asked.
Administrative records can provide a check on the quality of survey results, and vice versa.
While record linkage may facilitate the basic tasks of a statistical agency, it also raises serious confidentiality concerns. Ivan P. Fellegi, chief statistician of Canada, noted in a May 2, 1991, communication to the panel,
The issue of "moral outrage" as a possibility is real, but I believe [it] applies particularly to record linkage (matching). For this reason we are particularly careful about it.
STRUCTURE OF THE REPORT
In Chapter 2 we briefly review the evolution of the federal statistical system, the findings of earlier studies of confidentiality and data access, and the changes that warrant a reexamination of the issues. To complete the contextual framework of our study, we also provide an overview of our assessment of the responsibilities federal statistical agencies must be able to fulfill in their dealings with the public, data providers and data subjects, data users, other statistical agencies, and custodians of administrative records.
In Chapter 3 we address fair treatment of data providers, in particular the use of informed consent as an instrument for ethical communication by data collectors. Drawing on survey experiments, cognitive studies, and public opinion surveys, we also examine certain research findings related to confidentiality and data access.
We examine in Chapter 4 the legitimate expectations of data users, within and outside government, for access to federal statistical data. We also explore the ethical responsibilities of data users and advocate establishing their legal responsibilities in agency or systemwide statutes.
In Chapter 5 we review legislation governing confidentiality and data access, especially the Privacy Act of 1974, Title 13 of the U.S. Code, the Hawkins-Stafford amendments of 1988 for the National Center for Education Statistics, and the Public Health Services Act as it affects the National Center for Health Statistics. While recognizing a basis for diversity according to agency mission, we emphasize the value of all statistical agencies having a certain minimal standard of statutory authority to protect their data. The experience of the Energy Information Administration and the Bureau of Labor Statistics suggests that some agencies would benefit from having more comprehensive statutory protection of their statistical records.
Extensive dissemination of detailed information is necessary to ensure that ample value can be obtained from federal censuses and surveys. At the same time, statistical agencies must fulfill pledges of confidentiality to data providers. Thus, we examine in Chapter 6 technical and administrative procedures for providing information while ensuring that the risk of disclosure is at most minimal.
In Chapter 7 we address confidentiality issues associated with statistical data on organizations. Using four case studies, we develop
a conceptual basis for similarities and differences in the treatment of data on organizations compared with data on individuals and households.
We address the management of confidentiality and data access functions in Chapter 8, with particular attention to interagency coordination and the cross-national experience. We also explore issues of agency staffing and data protection legislation.
Our findings and recommendations are presented in Chapters 3 through 8.
Throughout our deliberations we have been mindful that regardless of the efforts put forth, the tension between data protection and data access will not go away. At best one can hope for a temporary consensus each time the community of interested parties revisits this issue. Ideally, as with isometric exercise, achieving correct dynamic tension in one round builds greater strength for the next round.
could not have been mounted without mustering the data to show that the problems were important, and that the proposed actions were likely to improve public health.
For example, Allison and Cooper (1991) note a case in which Institutional Brokers Estimate System (IBES) filed suit against a researcher who criticized their data and imposed conditions on academic researchers that (1) require them to clear all potentioal publications with IBES so the latter can have the opportunity "to identify factual errors or misunderstandings" and (2) require researchers using IBES data to refrain from providing access to others (including research assistants) without prior clearance.
See, for example, Bureau of the Census (1982), Louis Harris and Associates, Inc. (1981, 1983), National Research Council (1979), the Roper Organization, Inc. (1980).
To facilitate its work, the panel commissioned several background papers on issues bearing on confidentiality and data issues. The papers papers appear in a special issue of the Journal of Official Statistics, 1993(2). See Appendix A for a list of the papers.