The Panel on Confidentiality and Data Access was charged by the Committee on National Statistics and the Social Science Research Council with developing recommendations that could aid federal statistical agencies in their stewardship of data for policy decisions and research. Three areas were of paramount concern in the panel's deliberations: protecting the interests of data subjects through procedures that ensure privacy and confidentiality, enhancing public confidence in the integrity of statistical and research data, and facilitating the responsible dissemination of data to users.
STUDY GOALS AND SCOPE
Deciding on the exact scope of our investigation was not easy, for the federal statistical system is complex and far reaching, and its boundaries are not clearly defined. More than 70 federal agencies have a role in collecting data from individuals, households, farms, businesses, and governmental bodies and disseminating those data for a variety of statistical purposes. Federal statistical activities include the development and dissemination of large, general-purpose data sets based on censuses, surveys, and administrative records. They also include the collection and analysis of personal data in experimental research with human subjects. A few federal
statistical agencies conduct general or multipurpose programs (e.g., the Bureau of the Census), but many others conduct specialized programs or activities (e.g., the Bureau of Labor Statistics and the National Center for Education Statistics). In addition, some basically programmatic agencies conduct some statistical activities (e.g., the Federal Aviation Administration and the Internal Revenue Service). Finally, the data subjects and units of analysis for statistical programs include persons and organizations, but when the concepts of privacy and confidentiality are applied to organizations, they have quite different meanings than they do when applied to persons.
Given the complexity of the federal statistical system, designing an ideal configuration to address confidentiality and data access issues throughout the system, or even in any one federal statistical agency, was too daunting for this panel. Instead, we sought to contribute to a long tradition in the statistical community of periodically reconsidering current institutional structures and practices. Fundamentally, we seek to spur this ongoing process by articulating and applying three tenets of an ethic of information in a free society: democratic accountability, constitutional empowerment, and individual autonomy. We believe this attention to underlying principles can have a more beneficial and lasting impact than would an attempt to provide detailed recommendations for micromanagement of agency procedures for confidentiality and data access.
Although we recognize the inherent tension between data protection and data access we do not advocate a specific trade-off between the two. The dynamics of such a trade-off would be complicated and heavily influenced by the missions and operational environments of individual agencies, so that a single solution would not work for every agency. Nevertheless, we see some opportunities to enhance data access without decreasing data protection, and some opportunities to increase data protection without diminishing data access.
The panel's analysis and recommendations are based on an ethic of information formed of the three principles referred to above: democratic accountability, constitutional empowerment, and individual autonomy. The ethical guidance these principles provide for the structure and practice of statistical agencies is often reinforcing, but not always harmonious.
Functionally, democratic accountability recognizes the responsibilities of those who serve on behalf of others. It requires that the public have access to comprehensive information on the effectiveness of government policies. Government statistical agencies play a pivotal role in ensuring democratic accountability by obtaining, protecting, and disseminating the data that allow the accurate assessment of the influence of government policies on the public's well-being. Furthermore, they themselves are accountable to the public for two key functions in this process: (1) protecting the interests of data subjects through procedures that ensure appropriate standards of privacy and confidentiality and (2) facilitating the responsible dissemination of data to users.
Constitutional empowerment refers to the capability of citizens to make informed decisions about political, economic, and social questions. In the United States, constitutional theory emphasizes that ultimate power should reside in the people. In order to advance the common welfare, certain specific powers are delegated to a representative government (Article X, U.S. Constitution). Constitutional practice emphasizes restraints on executive excess and broad access to the political process through the direct election of representatives as well as through separation and balance of power.
Individual autonomy refers to the capacity of members of society to function as individuals, uncoerced and with privacy. Protection of individual autonomy is a fundamental attribute of a democracy. If excessive surveillance is used to build data bases, if data are unwittingly dispersed, or if those who capture data for administrative purposes make that information available in personally identifiable form, individual autonomy is compromised.
KEY FINDINGS AND RECOMMENDATIONS
Most of the panel's findings and recommendations can be grouped into five categories, each of which represents a key aspect of the trade-offs between confidentiality and data access. In this section
we provide a brief summary of the major problems associated with each category and identify the broad principles underlying the solutions we suggest in the recommendations. We also refer to additional recommendations in the body of the report that augment the recommendations presented here. At the end of the section we present three cross-cutting recommendations bearing on the general management of confidentiality and data access functions in federal statistical agencies. (The numbering of the recommendations below follows that of the full report.)
STATUTORY PROTECTION AGAINST MANDATORY DISCLOSURE OF INDIVIDUALLY IDENTIFIABLE DATA
For federal statistical agencies, pertinent statutes determine who can have access to individually identifiable information collected for statistical and research purposes, the conditions of access, and the penalties for unlawful use and disclosure of the information. Statutes also determine who can have access to administrative records for statistical and research purposes.
Two kinds of legislation provide the framework for federal policies and practices with respect to confidentiality and data access: government-wide and agency-specific legislation. In the former category, the Privacy Act of 1974 (P.L. 93–579) is most important, and the Freedom of Information Act of 1966 (P.L. 89–487) and the Paperwork Reduction Act of 1980 (P.L. 96–511) are also relevant. In the latter category, several federal statistical agencies have laws that further specify the confidentiality and data access policies they must follow (e.g., the Bureau of the Census and the National Center for Health Statistics). The procedures of some agencies, however, are not backed by statutory provisions. Instead, they must rely on persuasion, common-law tradition, and other means to protect identifiable statistical records from mandatory disclosure for nonstatistical uses. But there is no guarantee that such means will always be successful.
The basic distinction between statistical and administrative data is important. To carry out their basic functions, government agencies collect enormous amounts of data, most of which are used directly for various administrative purposes. Those data collected exclusively for statistical and research purposes form a tiny fraction of the total. Data collected for administrative purposes are often useful and appropriate for statistical purposes, as when patterns of Food Stamp applications are used to trace the effects
of program changes. In contrast, data collected for research and statistical purposes are inappropriate for administrative uses.
These ideas are summarized in the concept of functional separation: Data collected for research or statistical purposes should not be made available for administrative action about a particular data subject. This concept was enunciated in a recommendation of the Privacy Protection Study Commission (1977a:574):
That the Congress provide by statute that no record or information contained therein collected or maintained for a research or statistical purpose under Federal authority or with Federal funds may be used in individually identifiable form to make any decision or take any action directly affecting the individual to whom the record pertains, except within the context of the research plan or protocol, or with the specific authorization of such individual.
In part to ensure that statistical data are not used for administrative purposes, agencies give data providers pledges of confidentiality, both explicit and implicit. Unless those pledges are backed by legal authority, however, they provide an inadequate shield against administrative uses.
Recommendation 5.1 [parts a and b] Statistical records across all federal agencies should be governed by a consistent set of statutes and regulations meeting standards for the maintenance of such records, including the following features of fair statistical information practices:
a definition of statistical data that incorporates the principle of functional separation as defined by the Privacy Protection Study Commission
a guarantee of confidentiality for data.
Recommendation 7.2 Legislation that authorizes and requires protection of the confidentiality of data for persons and organizations should be sought for all federal statistical agencies that do not now have it and for any new federal statistical agencies that may be created.
The panel believes that the principle of functional separation should apply equally to data on persons and data on organizations (Recommendation 7.1). In addition, the panel recognizes that there may have to be some exceptions to the principle of functional separation, but it stresses that data providers participating in statistical
surveys and censuses must be told of any planned or potential nonstatistical uses of the data they provide (see Recommendation 3.2 below).
BARRIERS TO DATA SHARING WITHIN GOVERNMENT
Some of the laws that govern the confidentiality of statistical data prohibit or severely limit interagency sharing of data for statistical purposes. Laws that control access to administrative records, such as reports of earnings covered by Social Security, restrict their use for statistical purposes. These barriers to data sharing for statistical purposes have led to costly duplication of effort and excessive burden on individuals and organizations who are asked to supply information. They have also made it difficult or impossible to develop data sets needed for policy analysis on topics of major interest to the public.
Not all barriers to data sharing are necessarily statutory, however. Within agencies, organizational inertia and excessive concern for bureaucratic turf can also impede data sharing.
Recommendation 4.1 Greater opportunities should be available for sharing of explicitly or potentially identifiable personal data among federal agencies for statistical and research purposes, provided the confidentiality of the records can be properly protected and the data cannot be used to make determinations about individual data subjects. Greater access should be permitted to key statistical and administrative data sets for the development of sampling frames and other statistical uses. Additional data sharing should only be undertaken in those instances in which the procedures for collecting the data comply with the panel's recommendations for informed consent or notification (see Recommendation 3.2 below).
In part (f) of Recommendation 5.1 (parts (a) and (b) appear above), the panel further recommends that ''a provision that permits data sharing for statistical purposes under controlled conditions" be included in the "consistent set of statutes and regulations" governing the maintenance of federal statistical records. The panel also believes interagency sharing of data for statistical purposes should include the sharing of lists of businesses by federal and state agencies (Recommendation 7.4).
ACCESS TO DATA BY NONGOVERNMENT USERS
Some data users complain that federal statistical agencies are not always adequately responsive to their requests for access to data. Others appear to be dissuaded from attempting to use federal statistical data because of perceptions of difficulty of access.
Because of legitimate concerns about the possibility of disclosure of individual information, statistical agencies have limited the amount of detailed data provided to nongovernment users in tabulations and public-use microdata files. This lack of detail restricts the ability of users to do analyses that could contribute to the understanding of significant economic, social, and health problems. Some agencies have developed mechanisms for providing access to more detailed information on a restricted basis (e.g., on-site access at an agency office, licensing agreements, remote on-line access for registered users, data release in encrypted CD-ROM format), but current arrangements do not meet all legitimate needs.
Recommendation 6.4 Statistical agencies should continue widespread release, with minimal restrictions on use, of microdata sets with no less detail than currently provided.
Recommendation 4.2 Federal statistical agencies should seek to improve the access of external users to statistical data, through both legislation and the development and greater use, under carefully controlled conditions, of tested administrative procedures.
The panel believes that, through a combination of legislation and administrative procedures, this can be done without sacrificing confidentiality protections for data subjects and data providers.
Recommendation 5.3 There should be legal sanctions for all users, both external users and agency employees, who violate requirements to maintain the confidentiality of data.
Under appropriate safeguards, statistical agencies should experiment with innovative ways of providing restricted access to data (Recommendation 6.6). In addition, federal statistical agencies that collect data on organizations should make a special effort to make that data more accessible to users (Recommendation 7.7).
The panel notes that it is difficult to define the problems that
inhibit data access without better access to information on the numbers and types of user requests that are being denied for confidentiality reasons. Thus, federal statistical agencies should establish systematic procedures for capturing and reviewing such information (Recommendation 4.3).
PRIVACY CONCERNS AND DECLINING COOPERATION IN SURVEYS
Many citizens believe increasingly and with some justification that their privacy is being eroded by organizations that develop and control the use of large data bases that contain detailed information about them. They see the linkage of data from different sources as a particular threat. For these and other reasons, statistical agencies are finding it more difficult to persuade persons and organizations to participate in statistical surveys, whether voluntary or mandatory.
Ethics and law demand that data providers be told about the conditions under which they are asked to supply information that will be used for statistical and research purposes. If participation is voluntary, data collectors must let data providers know this and give them enough information to make an informed decision about whether to provide the information requested.
Recommendation 3.2 Basic information given to all data providers requested to participate in statistical surveys and censuses should include
for data on persons, information needed to meet all Privacy Act requirements. Similar information is recommended for data on organizations, except that the requirement to inform providers about routine uses (as defined by the Privacy Act) is not applicable.
a clear statement of the expected burden on the data providers, including the expected time required to provide the data (a requirement of the Office of Management and Budget) and, if applicable, the nature of sensitive topics included in the survey and plans for possible follow-up interviews of some or all respondents.
no false or misleading statements. For example, a statement that implies zero risk of disclosure is seldom, if ever, appropriate.
information about any planned or potential nonstatistical uses of the information to be provided. There
should be a clear statement of the level of confidentiality protection that can be legally ensured.
information about any planned or anticipated record linkages for statistical or research purposes. For persons, this notification will usually occur in conjunction with a request for the data subject's Social Security number.
a statement to cover the possibility of unanticipated future uses of the data for statistical or research purposes.
information about the length of time for which the information will be retained in identifiable form.
The goal should be to give each data provider as much information as is necessary to make his or her consent as informed as he or she wishes it to be. A multistaged procedure is recommended: Those who want more information should have the opportunity to obtain it directly from the interviewer or by other means (Recommendation 3.1).
In addition, statistical agencies need to "know their respondents." How do they interpret concepts like privacy, confidentiality, disclosure, data sharing, and statistical purposes? Do they understand the informed consent and notification statements they are given? What information about themselves do they consider to be most sensitive? How are their decisions on survey participation influenced by different formats and modes of presentation? How do their reactions vary by race-ethnicity, gender, and socio-economic status?
Recommendation 3.4 Statistical agencies should undertake and support continuing research, using the tools of cognitive and survey research, to monitor the views of data providers and the general public on informed consent, response burden, sensitivity of survey questions, data sharing for statistical purposes, and related issues.
The panel believes that the risks of major or deliberate violations of privacy or confidentiality are extremely low in the federal statistical system. The risks are somewhat higher for federal administrative records, and probably highest of all for private sector record systems. Because the public does not always distinguish among these different types of records, there is a danger that violations not involving statistical data bases can engender public
indignation and have damaging spill-over effects on federal statistical programs. Thus, the panel further recommends that federal statistical agencies develop systematic public information activities, be prepared to deal quickly and candidly with actual or perceived violations of pledges of confidentiality, and as part of the communication process, work closely with appropriate advocacy groups (Recommendations 3.5–3.7).
STATISTICAL PROCEDURES TO PROTECT CONFIDENTIALITY
Technological advances in computers and communications offer opportunities and threats: opportunities to process, access, and analyze large data sets more efficiently and threats of unauthorized access to individually identifiable data. Statistical disclosure limitation procedures (e.g., cell suppression, random error, topcoding) are used to transform data to limit the risk of disclosure. Use of such a procedure is called masking the data, because it is intended to hide personal characteristics of data subjects. Some statistical disclosure limitation techniques are designed for data accessed as tables, and some are designed for data accessed as records of individual data subjects (microdata).
Many federal statistical agencies have standards, guidelines, or formal review mechanisms that are designed to ensure that (1) adequate analyses of disclosure risk are performed and (2) appropriate statistical disclosure limitation techniques are applied prior to release of tables and public-use microdata files. Those standards and guidelines, however, vary widely in their specificity: Some contain only one or two simple rules; others are much more detailed. This variation across agencies in the comprehensiveness of disclosure review has little justification in terms of agency mission. Further, unfulfilled opportunities exist for agencies to work together and learn from one another, perhaps pooling resources to investigate the strengths and weaknesses of various statistical disclosure limitation techniques.
Every federal statistical agency should develop standards and procedures for the application of effective statistical disclosure limitation techniques to all forms of data dissemination, taking advantage of relevant features of standards and procedures that have worked well for other agencies. Particular care should be taken in the review of proposals for the release of new public-use microdata files. In choosing among different disclosure limitation techniques, agencies should take account of the level of protection
provided and the effects on the ability of users to draw valid inferences.
Based on its findings, the panel endorses the Federal Committee on Statistical Methodology's recommendation that "All federal agencies releasing statistical information, whether in tabular or microdata form, should formulate and apply policies and procedures designed to avoid unacceptable disclosures" (Report on Statistical Disclosure and Disclosure-Avoidance Techniques, Statistical Policy Working Paper 2, 1978:41–42, Recommendation B1). Since this panel convened, the federal statistical agencies have initiated some activities consonant with the above recommendation. In particular, in 1991 the Office of Management and Budget's Statistical Policy Office took the lead in organizing an interagency committee to coordinate research on statistical disclosure analysis.
Recommendation 6.1 The Office of Management and Budget's Statistical Policy Office should continue to coordinate research work on statistical disclosure analysis and should disseminate the results of this work broadly among statistical agencies. Major statistical agencies should actively encourage and participate in scholarly statistical research in this area. Other agencies should keep abreast of current developments in the application of statistical disclosure limitation techniques.
Statistical disclosure limitation methods can hide or distort relations among study variables and result in analyses that are incomplete or misleading. Further, data masked by some disclosure limitation methods can only be analyzed accurately by researchers who are highly sophisticated methodologically.
Recommendation 6.2 Statistical agencies should determine the impact on statistical analyses of the techniques they use to mask data. They should be sure that the masked data can be accurately analyzed by a range of typical researchers. If the data cannot be accurately analyzed using standard statistical software, the agency should make appropriate consulting and software available.
The panel believes that no one procedure can be developed for all statistical agencies. Furthermore, confidentiality laws governing
particular agencies differ, as do the types of data collected and the needs of data users. Thus, the panel also endorses the following recommendations contained in Statistical Policy Working Paper 2:
In formulating disclosure-avoidance policies, agencies should give particular attention to the sensitivity of different data items…. Agencies should avoid framing regulations and policies which define unacceptable statistical disclosure in unnecessarily broad or absolute terms. Agencies should apply a test of reasonableness, i.e., releases should be made in such a way that it is reasonably certain that no information about a specific individual will be disclosed in a manner that can harm that individual (p. 42, part of Recommendation B1).
Given the potential difficulties that certain statistical disclosure limitation techniques can cause for analysts, it is important that federal statistical agencies involve data users in selecting such procedures. In the past, agency staffs have often been the sole determiners of which statistical disclosure limitation techniques are to be used prior to releasing tables and microdata files.
Recommendation 6.3 Each statistical agency should actively involve data users from outside the agency as statistical disclosure limitation techniques are developed and applied to data.
Three recommendations of the panel refer to the general management of the confidentiality and data access functions of federal statistical agencies.
Recommendation 8.1 Each federal statistical agency should review its staffing and management of confidentiality and data access functions, with particular attention to the assignment within the agency of responsibilities for these functions and the background and experience needed for persons who exercise these responsibilities.
Recommendation 8.2 Statistical agencies should take steps to provide staff training in fair information practices, informed consent procedures, confidentiality laws and policies, statistical disclosure limitation procedures, and related topics.
Recommendation 8.5 The panel supports the general concept of an independent federal advisory body charged with fostering a climate of enhanced protection for all federal data about persons and responsible data dissemination for research and statistical purposes. Any such advisory body should promote the principle of functional separation and have professional staff with expertise in privacy protection, computer data bases, official statistics, and research uses of federal data.