Information maintained and used solely for statistical or research purposes is not treated uniformly under existing law.
Commission on Federal Paperwork, 1977
The Commission believes that existing law and practice do not adequately protect the interests of the individual data subject.
Privacy Protection Study Commission, 1977
… to change or make a law involves a long, cumbersome process, while technologies and social conditions sometimes change very fast.
Edmund Rapaport, 1988
Legislation in large part determines and constrains federal policies and practices affecting data subjects and data users. For federal statistical agencies, pertinent statutes determine when response to surveys is mandatory, who can have access to individually identifiable information collected for statistical and research purposes, the conditions of access, and the penalties for unlawful uses and disclosures of the information. Statutes also determine who can have access to administrative records for statistical and research purposes.
Two kinds of legislation provide the framework for federal policies and practices with respect to confidentiality and data access: government-wide and agency-specific legislation. In the former category, the Privacy Act of 1974 (P.L. 93–579) is the most important, and the Freedom of Information Act of 1966 (P.L. 89–487) and the Paperwork Reduction Act of 1980 (P.L. 96–511) are also relevant. In the latter category, several federal statistical
agencies, for example, the Bureau of the Census, the National Agricultural Statistics Service (NASS), the National Center for Education Statistics (NCES), and the National Center for Health Statistics (NCHS), are subject to laws that go beyond the government-wide statutes in specifying the confidentiality and data access policies they must follow. Among legislation that affects the statistical and research uses of administrative records, the Tax Reform Act of 1976 (P.L. 94–455) is of special significance.
In its consideration of the legislation that regulates statistical and research uses of federal records, the panel benefited greatly from the endeavors of earlier commissions and other groups and individuals who have explored these issues, including Cecil (1993), the Commission on Federal Paperwork (1977a, b), Flaherty (1979, 1989), Newton and Pullin (1990, 1991), the Office of Federal Statistical Policy and Standards (1978), and the Privacy Protection Study Commission (1977a, b, c). The problems we identified were well documented by our predecessors, and they proposed reasonable solutions to them. Most of the problems persist, however, and their significance has only increased because of the changes since the late 1970s in the environment in which federal statistical agencies operate.
The regulatory structure provided by the government-wide and agency-specific statutes mentioned above is exceedingly complex. There is little uniformity in the treatment of confidentiality and data access questions, and the ability of federal statistical agencies to protect the confidentiality of individually identifiable records is not always backed by suitable statutory provisions. Conversely, in some instances excessive restrictions on access needlessly limit opportunities to share records across agencies for statistical purposes and to make detailed microdata sets available to potential users.
This chapter has three sections. We describe the main features of the relevant government-wide information statutes in the first section and agency-specific statutes for some of the major statistical agencies in the second. In the third section we present findings and recommendations.
The general plan of this report has been to deal primarily with issues relating to data for individuals in Chapters 1 through 6 and to address issues relating to data for organizations in Chapter 7. However, this chapter constitutes an exception to the general plan because many provisions of agency-specific legislation apply equally to both kinds of data.
GENERAL REGULATION OF FEDERAL STATISTICAL AND RESEARCH RECORDS
The Privacy Act of 1974 is the most prominent feature of a general system of regulation that also includes the conflicting demands of the Freedom of Information Act of 1966, the Paperwork Reduction Act of 1980, and the Computer Matching and Privacy Protection Act of 1988 (P.L. 100–503). The Freedom of Information Act specifies the conditions under which the disclosure of federal agency records, including statistical records, may be compelled. The Paperwork Reduction Act is intended to reduce unnecessary requirements for paperwork by federal agencies, and it permits review of data collection requests by the Office of Management and Budget (OMB). The Computer Matching and Privacy Protection Act regulates the use of computer matching of federal records subject to the Privacy Act. Matches performed for statistical purposes are specifically excluded from the coverage of the act, however.
Below we summarize two components of this general system of regulation, the Privacy Act and the Freedom of Information Act. Our focus is the impact of each act on federal data used for research and statistical purposes. The Privacy Act applies only to data on identifiable individuals; the Freedom of Information Act applies to data on individuals and organizations.
THE PRIVACY ACT
The administrative abuses of data held by the federal government that occurred in the early 1970s eroded public trust and confidence and led to the passage of the Privacy Act of 1974 and the Tax Reform Act of 1976. The Privacy Act was the first attempt by Congress to provide comprehensive protection of an individual's right to privacy by regulating the collection, management, and disclosure of personal information maintained by government agencies (Title 5 U.S.C. § 552a). 1 It specifies a general system of regulation for individually identifiable federal records.
Before the Privacy Act was passed, federal policy on data management practices encouraged data sharing among agencies in order to reduce the burden and expense of reporting. This open-access policy was restricted only when statutes provided for the confidentiality of specific sensitive record systems. In 1974, the Privacy Act reversed this general policy by recognizing the right of individuals to control dissemination of information they provided
about themselves to federal agencies. The Privacy Act sought to strike a balance—to preserve individuals' interests in controlling identifiable information while recognizing the legitimate uses of that information.2
Briefly, the general provisions of the Privacy Act require that federal agencies (1) grant individuals access to their identifiable records maintained by the agency, (2) ensure that existing information is accurate and timely and limit the collection of unnecessary information, and (3) limit the disclosure of identifiable information to third parties. This third provision of the Privacy Act, which forbids the disclosure of any identifiable record without the prior written consent of the individual (Title 5 U.S.C. § 552a(b)), is the crux of the right of privacy provided by the act.
The term statistical record is defined in the Privacy Act as "a record in a system of records maintained for statistical research or reporting purposes only and not used in whole or in part in making any determination about an identified individual, except as provided by § 8 of Title 13 (which governs the Census Bureau)" (Title 5 U.S.C. § 552a(a)(6)).
The Privacy Act defines a system of records as "a group of any records under the control of any agency from which information is retrieved by the name of the individual or by some identifying number, symbol, or other identifying particular assigned to the individual" (Title 5 U.S.C. § 552a(a)(5)).3 An enforceable informed consent requirement in the act could thwart the disclosure of identifiable information for purposes that the individual never considered and would not approve. As described below, however, numerous exceptions are possible.
The Privacy Act seeks to enforce its standards through civil and criminal penalties. Employees of agencies and their contractors who knowingly and willfully disclose personal information contrary to the requirements of the act may be fined up to $5,000 and the agency may be sued for "actual damages" (Title 5 U.S.C. § 552a(g)(1)(D),(i)(1)). Because of the need to show that the agency acted in a willful and intentional manner and the need to demonstrate actual damages, there have been few successful lawsuits (Flaherty, 1989:342–343; Lodge, 1984). Consequently, the effectiveness of these penalties has been questioned (see, e.g., Coles, 1991).
Twelve categories of exceptions to the consent requirement of the Privacy Act are intended to accommodate legitimate needs for individually identifiable information. For instance, an agency may, at its discretion, disclose identifiable records without prior written consent to officers and employees of the agency who have a need for the record in the performance of their duties. While such an exemption is certainly necessary, several federal organizations have interpreted the term agency quite broadly, thereby restricting the protections of the Privacy Act. For example, the Department of Health and Human Services has defined the entire department as a single "agency" under the terms of the Privacy Act, thereby permitting exchange of identifiable information throughout the department as long as there is a job-related need for such information (National Center for Health Statistics, 1984). Other exceptions include disclosures that are required by the Freedom
of Information Act (see below); exceptions granted to the Bureau of the Census (for planning or carrying out a census, survey, or related activity under Title 13 of the U.S. Code), the General Accounting Office (to permit auditing of federal programs), and the National Archives; and disclosure in emergency circumstances involving the health and safety of an individual. The Privacy Act also permits disclosure of identifiable information without written consent to other federal agencies for authorized civil or criminal law enforcement activities and disclosure pursuant to a court order—disclosures to which individuals would most likely decline to consent.
Of special interest to data users is an exception that permits access to records that are not individually identifiable upon "written assurance that the record will be used solely as a statistical research or reporting record" (Title 5 U.S.C. § 552a(b)(5)). The Privacy Act poses no barrier to the dissemination of anonymous information; if the research objectives can be accomplished with nonidentifiable data, rendering the data anonymous satisfies the standards of the Privacy Act and the information can be disseminated. In fact, as Cecil (1993) notes in a paper prepared for the panel, this exemption offers very little, because a record that is not individually identifiable is not a "record" within the definition of the Privacy Act (see note 3) and therefore is not subject to the restrictions on disclosure imposed by the act.
Obtaining data from individually identifiable federal records for statistical purposes can be difficult for data users, even those in other federal statistical agencies. The restrictions of the Privacy Act can bar the disclosure of identifiable records unless the disclosure is brought within one of the exemptions. In an ideal situation, data collectors would be able to anticipate such disclosure needs and obtain the informed consent of data subjects at the time the information is gathered. When the need for statistical or research access to agency records was not anticipated or when the initial consent becomes invalid, a data collector may have to recontact the data subjects to obtain proper consent. Recontacting data subjects who participated in an earlier statistical or research study imposes special difficulties, however. For example, some target populations are highly mobile, and thus addresses and telephone numbers that were obtained at the time the individuals were originally contacted may be outdated. Recontacting such data subjects is also likely to be expensive and subject to self-selection biases.
One of the Privacy Act exemptions permits disclosure of an
identifiable record for a "routine use," that is, "for a purpose that is compatible with the purpose for which it was collected" (Title 5 U.S.C. § 552a(a)(7)). An agency may choose to designate statistical analysis as a routine use of all or a selected portion of its record systems (see note 3). This enables the agency to give outside data users access to data from identifiable records for statistical purposes without first gaining the consent of the data subjects to whom the records pertain. Instead of obtaining data subjects' consent prior to disclosure for such a "routine use," the agency must only publish a notice of the anticipated routine uses of the record in the Federal Register and accept comments from the public for a period of 30 days (Title 5 U.S.C. § 552a(e)(4)(D),(e)(11)). Such routine uses must also be explained to data subjects when similar information is gathered in the future.
A great many agency notices in the Federal Register allow disclosure for statistical and research purposes as a routine use. For instance, the Department of Health and Human Services has been particularly thorough in identifying record systems that have research potential and publishing notices permitting research as a routine use (O'Neill and Fanning, 1976:171–188). One version of the department's routine-use notice requires an assessment of the risks and potential benefits of the research and requires the data user to sign an agreement to protect the records from subsequent disclosure. This is one instance in which the discretion delegated to agencies by the Privacy Act has been used to fashion a specific set of standards to permit data users from outside the agency to analyze data contained in individually identifiable records for statistical purposes while maintaining safeguards appropriate to the information. However, the need to rely on the routine-use exemption to overcome the failure of the statute to provide for statistical and research access to identifiable records is an awkward solution to the problem. Without a statutory policy concerning access to federal records for statistical and research purposes, individual agencies are free to develop regulations that may either be too restrictive or fail to offer adequate protection to the identifiable information provided by data subjects.
It is important to note that the Privacy Act, which applies indiscriminately to statistical and administrative records, offers little protection from improper disclosure of statistical records for nonresearch purposes. An agency could conceivably use the routine-use exemption to the act to diminish the protection for such records by designating audit and enforcement as routine uses after the data are gathered. Also, the Privacy Act places no obligation
on outside data users to maintain the confidentiality of the records or limit subsequent disclosure; once the records are released to a data user who is not under the jurisdiction of the act there is no assurance that the data subject's rights will be protected. The restrictions of the Privacy Act extend to contractors only when "an agency provides by a contract for the operation by or on behalf of the agency of a system of records to accomplish an agency function" (Title 5 U.S.C. § 552a(m)).
In summary, the Privacy Act's failure to distinguish statistical and research purposes from administrative purposes in restricting access to individually identifiable records can pose a major obstacle for data users who want to analyze such information for statistical or research purposes. Regulation of administrative records is based on the awareness that the records may be used to make decisions regarding individuals, such as the award or termination of benefits. Such a system of regulation does not recognize dissimilar analytic uses of data for statistical or research purposes in which the information is not used to make decisions regarding individual data subjects. Data users have become adept at framing their statistical or research needs, and the associated data protections, within the standards developed for administrative records. In some instances, it may be possible to anticipate the statistical or research purposes and obtain consent for disclosure of identifiable data at the time they are collected. Otherwise, the data user must structure a request for statistical or research use of individually identifiable data to fit within one of the exceptions to the consent requirement of the Privacy Act, such as the routine-use exception.
FREEDOM OF INFORMATION ACT
The Freedom of Information Act of 1966 (Title 5 U.S.C. § 552) also regulates the disclosure of research and statistical records. The act permits public access to records maintained by federal agencies unless the request for access falls within one of nine specific exemptions. Statistical records maintained by federal agencies, even those developed by private parties, are subject to disclosure under the act if not otherwise exempt. For example, records concerning differences in breast-fed and formula-fed infants that were collected by a group of nonprofit, church-related organizations and deposited with a federal agency for statistical analysis were ordered disclosed when producers of infant formula requested copies of the data from the agency under the act (see St. Paul's
Benevolent Educational and Missionary Institute v. U.S., 506 F. Supp. 822 (N. D. Ga. 1980)). If the research records are not maintained by the federal agency, the act does not compel disclosure, even if the agency funded the research and relies on the results in setting public policy. For example, the Supreme Court declined to order the disclosure of research records developed under an extended research grant awarded to a group of private physicians and scientists studying the effectiveness of treatments of diabetes. Even though the Food and Drug Administration relied on the controversial findings of the study to restrict the labeling and use of certain drug treatments and refused to make the data available for independent reanalysis, the Court determined that the act does not reach records maintained by grantees (see Forsham v. Harris, 445 U.S. 169 (1980)).
Two exemptions to the general disclosure mandate of the Freedom of Information Act are of concern to researchers. First, an exemption under the act may be appropriate for identifiable records if such records will yield sensitive information about individual research participants. Exemption 6 of the act restricts disclosure of "personnel and medical and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy" (Title 5 U.S.C. § 552(b)(6)). This exemption is intended to protect sensitive information identifiable to an individual, including research and statistical information, from unwarranted disclosure. If the court determines that disclosure of identifiable agency records "can reasonably be expected to invade [a] citizen's privacy," disclosure will not be ordered (see D.O.J. v. Reporters Committee for Freedom of the Press, 489 U.S. 749 (1989); New York Times Co. v. N.A.S.A., 920 F.2d 1002 (D.C. Cir. 1990) (en banc)). Although there have been a few cases in which identifiable research records have been the object of a request, evolving standards of protection and different standards of protection across federal circuits result in considerable uncertainty regarding the degree of protection under this exemption.4 Nonetheless, the proprivacy provisions appear to be a strong protection.
Second, research information with commercial or financial value may be withheld under exemption 4 of the Freedom of Information Act, which extends protection to "trade secrets and to information which is commercial or financial, obtained from a person, and privileged or confidential" (Title 5 U.S.C. § 552(b)(4)). Such records need not be individually identifiable and may include records of organizations. This exemption, however, has been narrowly construed, which has made it difficult for most research and statistical
records to qualify. To qualify for the exemption, the research records must be of commercial or financial value that will be diminished by disclosure. Further, the research records must qualify as "privileged or confidential." A promise of confidentiality when the research data were collected will be given some weight in determining if this standard is met, but such a promise alone will not be sufficient to protect the data. Exemption 4 has been granted only when there has been a further demonstration that disclosure of information would result in competitive harm to a commercial enterprise (see Public Citizen Health Research Group v. Food and Drug Administration, 539 F. Supp. 1320 (D. D. C. 1982)) or that release of such information is likely to "impair the government's ability to obtain necessary information in the future" (see National Parks Conservation Association v. Morton, 498 F.2d 765, 770 (D.C. Cir. 1974)). These standards were recently interpreted to recognize a greater interest in maintaining the confidentiality of information that is provided the government on a voluntary basis if it is of a kind that the provider would not customarily release to the public (Critical Mass Energy Project v. N.R.C., 1992 U.S. App. Lexis 19336 (D.C. Cir., 1992), (en banc)). This categorical exemption for information supplied to the government on a voluntary basis is intended to ensure its continued availability.
As with the government-wide statutes, agency-specific legislation is not uniform in specifying the conditions of access to individually identifiable data for statistical and research purposes and the requirements for maintaining the confidentiality of such data. To implement agency-specific legislative provisions, federal statistical agencies have developed administrative policies, practices, and procedures, some of which we cover in this section. We provide a more thorough discussion of the administrative mechanisms used by the federal statistical agencies in Chapter 6.
In this section, we first summarize the agency-specific legislation that governs three major federal statistical agencies (Bureau of the Census, National Center for Education Statistics, and National Center for Health Statistics). We also discuss the impact on federal statistical activities of the Tax Reform Act of 1976, the legislation that governs access to the data collected by the Internal Revenue Service (IRS). Our presentation is not meant to be an exhaustive discussion of agency-specific legislation; other federal
statistical agencies, such as the Bureau of Justice Statistics (BJS) and the National Agricultural Statistics Service, have statutes that protect the confidentiality of their data, and we refer to selected provisions of those statutes in presenting our findings and recommendations.
Next, we address issues raised by the fact that federal statistical agencies without adequate statutory protection for the confidentiality of their data must rely on administrative policies that have no statutory basis. We use the Bureau of Labor Statistics (BLS) and the Energy Information Administration (EIA) to illustrate the difficulties that agencies have encountered and the strategies they have employed when protection of data in their custody is not addressed by statute. We conclude this section with a discussion of the legislation for two new statistical agencies, one established by statute in the Department of Transportation in 1991 and another that has been proposed for environmental statistics.
Most of the background material for this section was provided by the agencies discussed in response to the panel's request for information about their current statutory requirements, policies, and practices. We appreciate the agencies' willingness to help by providing this material.
BUREAU OF THE CENSUS
In contrast to the lax protection of statistical records under the Privacy Act of 1974, the statutory protection of statistical information collected by the Census Bureau under Title 13 of the U.S. Code is extremely rigorous. It allows few opportunities for those outside the agency to use individually identifiable records for statistical purposes. In recent years the Census Bureau has made some effort to develop standards and procedures that would permit greater access to information that it maintains (Gates, 1988). But the statutory standards, and interpretation of those standards by the Supreme Court, still pose a formidable barrier to the release of data outside the Census Bureau in a form that permits the range of statistical analyses desired by sophisticated data users.
The Census Bureau is governed by legislation that permits the bureau to (1) use census information only for statistical purposes (with one limited exception, an agreement with the National Archives, which is discussed in Chapter 6), (2) publish data only in a way that prevents the identification of individuals, and (3) prohibit anyone from examining information that identifies an individual
unless they are Census Bureau employees sworn to uphold the confidentiality provisions of Title 13 (U.S.C. § 9(a)). In addition, this statute allows the Census Bureau to employ temporary staff to do work authorized by Title 13 provided they are sworn to uphold the confidentiality provisions specified in the legislation (see Title 13 U.S.C. § 23). Such "special sworn employees" may be employed in another government agency. Violation of Title 13 standards by employees or special sworn employees carries a penalty of a fine of $5,000 or five years in prison, or both (Title 13 U.S.C. § 214). The statute also permits the Census Bureau to furnish "tabulations and other statistical materials which do not disclose the information reported by, or on behalf of, any particular respondent" and to "make special statistical compilations and surveys, for … [parties outside the Census Bureau] upon the payment of the actual or estimated cost of such work" (Title 13 U.S.C. § 8(b)).
The statutory standards governing the Census Bureau are among the few that have the benefit of an interpretation by the Supreme Court. In Baldrige v. Shapiro (455 U.S. 345 (1982)) the Court considered the extent to which master address lists, compiled as part of the decennial census, can be made available outside the Census Bureau. Several cities challenged the 1980 census count of their populations, contending that the census had erroneously counted occupied dwellings as vacant, and they sought to compel disclosure of a portion of the address lists used by the Census Bureau in conducting its count in their respective jurisdictions. Although the case involved access to this information for purposes other than research, in ruling on the case the Court also offered an interpretation of Title 13 that clarifies the limits of the discretion of the Census Bureau to release statistical information that is individually identifiable.
The district court had ordered the Census Bureau to make the address register available, reasoning that the confidentiality limitation is "solely to require that census material be used in furtherance of the Bureau's statistical mission and to ensure against disclosure of any particular individual's response." The Supreme Court reversed this decision, interpreting the standards of Title 13 to suggest that the release of any microdata, even microdata not identifiable to an individual, is inconsistent with the standards of Title 13. The Court cited the constitutional purpose of the census in apportioning representation among the states and the importance of public cooperation in obtaining an accurate census. According to the Court, the confidentiality protections of
Title 13 are intended to encourage public cooperation by explicitly providing for the nondisclosure of certain census data, and "[n]o discretion is provided to the Census Bureau on whether or not to disclose the information referred to in §§ 8(b) and 9(a)" [of Title 13] (Baldrige v. Shapiro, 455 U.S. 355 (1982)).
The cities that sought the master address lists had argued that the confidentiality protections were intended to prohibit disclosure of the identities of individuals who provided census data. The Court, however, rejected the contention that the confidentiality provisions protect raw data only if the individual respondent can be identified, thereby raising questions regarding the authority of the Census Bureau to release individual census data even when the identification of individuals is not possible:
[Various parties] vigorously argue that Sections 8(b) and 9(a) of the Census Act are designed to prohibit disclosure of the identities of individuals who provide raw census data; for this reason, they argue, the confidentiality provisions protect raw data only if the individual respondent can be identified. The unambiguous language of the confidentiality provisions, as well as the legislative history of the Act, however, indicates that Congress plainly contemplated that raw data reported by or on behalf of individuals was to be held confidential and not available for disclosure (Baldrige v. Shapiro, 455 U.S. 355 (1982)).
The Census Bureau has not interpreted the Title 13 standards as broadly as this language would permit and has continued to release unidentifiable microdata for statistical purposes. The Court's opinion, while speaking of "data" and "statistical uses," is in fact about the authority of states and municipalities to audit the findings of the Census Bureau, a purpose that was specifically precluded when the statute was passed. Further, access to address lists would imply access to any individuals living at the addresses, so characterization of the research data as "unidentified" seems misplaced. Nevertheless, the language of the Supreme Court suggests that the Census Bureau is limited in its discretion to release data to persons who are not sworn to uphold the confidentiality provisions of Title 13.
The restrictions discussed above extend to all data collections undertaken under the authority of Title 13. The Census Bureau also may undertake survey research under alternative authority, such as Title 15, that allows it to conduct specific studies for other organizations, thereby avoiding the restrictions of Title 13 on release of identifiable information. Growing demand for identifiable information that can be used in conducting follow-up surveys
or linked with administrative data has resulted in the Census Bureau's conducting under Title 15 authority an increasing number of reimbursable surveys sponsored by other agencies. There are two primary differences in the way the Census Bureau conducts such surveys. First, the bureau cannot use the decennial census as a sampling frame if identifiable microdata from the survey are to be shared with the sponsoring agency. Second, when seeking the consent of respondents to participate in a Title 15 survey, the bureau makes clear that it is collecting the information as an agent of the sponsoring agency and that the sponsor, not the Census Bureau, will be responsible for maintaining the confidentiality of the information.
NATIONAL CENTER FOR EDUCATION STATISTICS
Statistical data collected by the National Center for Education Statistics is governed by legislation that follows closely the pattern of protection for data gathered by the Census Bureau. In April 1988, Public Law 100–297, the Augustus F. Hawkins-Robert T. Stafford Elementary and Secondary School Improvement Amendments of the General Education Provisions Act (GEPA), was passed. The Hawkins-Stafford amendments (U.S.C. § 1221e-1) established a rigorous system of protection of educational statistical data collected and maintained by NCES. The system (1) prohibits the use of "individually identifiable information" for purposes other than the statistical purposes for which it was supplied, (2) prohibits the publication of information that will permit the identification of an individual, (3) permits examination of individually identifiable reports only by persons authorized by the NCES commissioner, and (4) limits access to individually identifiable data to those who take an oath not to disclose such data. The amendments allow NCES to use temporary employees to analyze individually identifiable data for statistical purposes if such persons are sworn to observe the limitations described above.
Information collected as part of the National Assessment of Educational Progress (NAEP), one of the ongoing studies of NCES, is subject to a separate confidentiality requirement under the Hawkins-Stafford amendments. To maintain the confidentiality of NAEP records, the amendments state that
the Commissioner shall ensure that all personally identifiable information about students, their educational performance, and their families and that information with respect to individual schools remain confidential, in accordance with section 552a of
title 5, United States Code (emphasis added) (P.L. 100–297, § 3403(a)(i)(4)(B)(i)).
This section of the amendments cites the Privacy Act of 1974 as the standard for maintaining confidentiality of the information, even though the Privacy Act applies only to individuals, not to institutions and organizations such as schools (see Newton and Pullin, 1990). The provision requiring that school information remain confidential poses particular difficulty for those who wish to use NAEP data to study the effects of programs in schools with certain characteristics.
In 1990 the statutory framework governing NCES records was further confused by the passage of amendments to GEPA that specifically exempted from the protections of the Hawkins-Stafford amendments data gathered as part of several longitudinal studies of individuals at the postsecondary level and financial aid surveys (see § 252 of the Excellence in Mathematics, Science and Engineering Act of 1990, P.L. 101–589). It appears that these data are governed now by the comparatively lax protections of the Privacy Act. The resulting exceptions to the general provisions of the Hawkins-Stafford amendments provide a fragmented pattern of regulation that is likely to leave data subjects and providers uncertain of the extent to which their information will remain confidential.
Inadvertent disclosure is an especially difficult problem. The educational research community has developed a number of data sets with identifiable information that are beyond the reach of this legislation, some of which include information gathered by NCES prior to the Hawkins-Stafford amendments. The possibility of matching survey data to existing records, thereby inadvertently disclosing information that can be associated with specific data subjects, is much greater for educational records than for individual Census Bureau records. Thus, when the individual is the unit of analysis and similar surveys exist that would permit the linking of information, an NCES policy statement (National Center for Education Statistics, 1989) directs NCES staff to examine common variables and distributions of responses to minimize the risk of disclosure. Staff of NCES also review non-federal data files to ensure that such files will not present an opportunity for a match with NCES data that would yield individually identifiable information. Other information that NCES staff must take into consideration before releasing a public-use data file is set forth in
''Standard for Maintaining Confidentiality" (see Standard IV-01-92 in National Center for Education Statistics, 1992:39–41).
Files in which data on individuals are nested within assessments of institutions and organizations (e.g., school, district, state) require additional scrutiny. Many educational research programs rely on such nested research designs to assess the effectiveness of classroom-and school-based programs. If data on individual students or teachers are associated with specific schools, the released data cannot include schools that can be uniquely identified. According to the policy statement,
the assumption is made that school and school district administrators will know which students or teachers were interviewed in the survey, regardless of any procedures used to disguise the identity of these individuals or attempts to keep this information from the administrators. Therefore, if a school or district can be identified in a file, that file cannot be linked to student or teacher records (National Center for Education Statistics, 1989:5).
While the legislation governing release of NCES statistical information is similar to the restrictive legislation governing the Census Bureau, NCES has developed policies that permit greater access to statistical data. For example, the center has disseminated microdata in encrypted CD-ROM format and developed licensing agreements that allow researchers to use data at their own work sites (see Chapter 6 for a discussion of such techniques). The center has more freedom to develop such policies because it is free of the restrictive statutory regulations governing the Census Bureau. But, in other ways its task is even more difficult. Its system of nested surveys of individuals and organizations complicates the protection of schools, teachers, and students. Publicly available information on individual schools and districts, in addition to a network of sophisticated educational researchers who are familiar with existing data sets, increases the opportunity for inadvertent disclosure beyond that faced by the Census Bureau. These differences result in considerable tension when legislative protection patterned after the Census Bureau is extended to NCES.
NATIONAL CENTER FOR HEALTH STATISTICS
Statutes protecting health records collected, maintained, and disseminated by the National Center for Health Statistics offer another example of how the inadequate protection provided by the Privacy Act can be supplemented through specific statutory
authority. Section 308(d) of the Public Health Service Act (P.L. 85–58) restricts use of data obtained by NCHS to the purposes for which they were originally obtained (42 U.S.C. § 242m(d)). When NCHS requests information, it informs the person or organization supplying the data of the general anticipated uses, which are usually limited to statistical research and reporting, and subsequent uses of the information are then so limited. Section 308(d) further indicates that such information may not be disclosed outside the agency in identifiable form without the advance, explicit consent of the person or establishment to which they relate.
A number of manuals, policy statements, and publications expand on these requirements (see Mugge, 1984; National Center for Health Statistics, 1978, 1984). For instance, the NCHS Staff Manual on Confidentiality (National Center for Health Statistics, 1984) provides a thorough discussion of how the requirements are to be interpreted.5 The most noteworthy aspect of NCHS's policy is the explicit recognition that there is some risk of disclosure of individually identifiable information with the release of published tables and public-use data files and that such risks must be balanced against the importance of sharing statistical information. This is one of the very few instances in which explicit recognition of this fact appears in an official agency policy statement.
Showing the center's awareness of the difficulty in ensuring confidentiality of data when publishing tables with small cell sizes, the manual presents guidance for avoiding inadvertent disclosure. It states that "mitigating circumstances in a given situation which may make it acceptable to publish data that, strictly speaking, could result in 'disclosures'" would justify a "special exception" to the guidelines (p. 17). For example, if data are based on a sample that is a small fraction of the universe, it might be assumed that disclosure will not occur through published tables. Errors in the data or incomplete reporting may also reduce the likelihood of disclosures taking place to a point that would justify permitting publication of otherwise revealing tables. Similarly, in discussing the standards for the development of public-use microdata files, the manual recognizes that
the only absolutely sure way to avoid disclosure through microdata tapes is to refrain completely from releasing any microdata tapes, but this would deprive the Nation of a great deal of very important health research. Therefore, the Center must make a determination as to when the public's need is sufficiently great to justify the risk of disclosure. It is the Center's policy to release
microdata tapes for purposes of statistical research only when the risk of disclosure is judged to be extremely low (p. 18).
In assessing the acceptability of the risk, NCHS considers the extent to which the data involve a sample of the universe of relevant individuals or establishments, the extent and availability of outside information necessary to identify an individual or establishment, the expense of undertaking such an effort, and the sensitivity of the information provided.
According to the NCHS manual, one exception to the above standards that did not require a "special exception" involved the publication of vital statistics. For example, a table could be published that indicated that within a specific county during a specific period there was one infant death or two deaths from rabies. Such exceptions were permitted because of "a long-standing tradition in the field of vital statistics not to suppress small frequency cells in the tabulation and presentation of data" and because such publication "rarely, if ever, reveals any information about individuals that is not known socially'' (p. 17). However, in 1989 NCHS began to reexamine its policy on the release of vital statistics data because the publication of statistical tables by county or the release of detailed public-use microdata files might inadvertently reveal a cause of death, such as acquired immune deficiency syndrome (AIDS), that was not "socially known" in the community. The center and the states were concerned about controlling inadvertent disclosure in the public-use microdata files because the center had been releasing county-level data in its vital statistics.
The center obtains its vital statistics data from state health departments under a contractual arrangement that assures that no information will be released that could permit the identification of a specific individual or institution. However, two court cases (a 1987 case involving the Atlanta Constitutional Journal and the State Health Department of Georgia, and another in 1991 involving the American Civil Liberties Union and the State of Illinois) raised the concern of the states, as well as NCHS, that the center was releasing data in public-use microdata files that might result in the inadvertent disclosure of individuals.6 As part of its policy reexamination, NCHS held a special session at the July 1991 Public Health Conference on Records and Statistics to discuss the issues and a variety of alternative solutions with its data users.
As a result of its reexamination, NCHS changed its policy beginning with the release of the 1989 natality and mortality public-use
microdata files, which were released in the fall of 1992 (see National Center for Health Statistics, 1992b). According to this new policy, NCHS will release public-use microdata files in three versions:
Single-year format: This format will be for a single calendar year and include data for cities, counties, and metropolitan areas with a population of 100,000 or more. Certain dates (date of the event and date of birth of mother, father, and/or decedent) will be excluded from the file. This file will be distributed by the National Technical Information Service (NTIS).
Multiyear format: This second format will contain data for a three-year period (e.g., 1987–1989) for all counties and metropolitan areas and for cities with a population of 50,000 or more. Again, certain dates (date and year of the event as well as date of birth of mother, father, and/or decedent) will be excluded from the file. This file will also be distributed by NTIS.
Special formats: NCHS will consider releasing vital statistics microdata files to data users whose needs cannot be met by either the single-year or multiyear format. Such users must contact NCHS and explain their additional data needs. Users who obtain specially formatted data files from NCHS will be required to sign a more extensive agreement on data use than the one that NCHS typically requires, an agreement that provides additional guidelines on avoiding inadvertent disclosure (see Figure 4.1 and the discussion of NCHS's data use agreement in Chapter 4).
NCHS will reevaluate its new policy on the release of vital statistics data after it has been in effect for one year.
The NCHS standards address in a straightforward manner the difficult issue of deductive disclosure of information. Any disclosure is viewed as entailing some degree of risk, and NCHS is frank in acknowledging this in balancing the degree of risk against the benefit to the public that is likely to arise from the research. The NCHS has a number of advantages that may permit it the latitude to develop such policies. First, it is an agency that recognizes its primary role as being research, thereby avoiding the difficulties that arise when trying to design a system for handling records that are used for administrative purposes as well. Further, it collects much of its own information and does not require the cooperation of agencies that may follow more restrictive practices. (The NCHS vital statistics program is an exception to this point; this program must rely on statistical information reported by state registrars and is subject to varying state restrictions. Since
NCHS began relying on information the Census Bureau collects for it under the authority of Title 15, rather than Title 13, it has avoided the more onerous restrictions regarding disclosure of information by the Census Bureau.) Finally, NCHS maintains a staff of skilled researchers, which enables it to conduct some of the more sensitive analyses within the agency. Nevertheless, its data collection and dissemination activities provide an opportunity to examine the consequences of having policies that recognize the possibility of an inadvertent disclosure of identifiable information and attempt to minimize that risk while releasing information that permits statistical research and reporting goals to be accomplished.
INTERNAL REVENUE SERVICE
Income information is perceived by many as among the most sensitive and most useful information collected by the federal government. Studies based on levels and changes in levels of reported income address a wide range of economic issues. The staff of the Statistics of Income Division, IRS, uses data from income tax returns and other sources to prepare publications and studies on a wide range of topics (see Wilson and Smith, 1983). When the success of such studies requires the use of individually identifiable information from other agencies, such data can sometimes be made available within the standards of the Privacy Act. However, the policy of the Census Bureau is that identifiable census records cannot be transferred to the IRS for statistical purposes even if the IRS employees who will have access to the records are special sworn employees of the Census Bureau.7
Difficulties also arise when outside agencies require income information for the conduct of studies. Although no abuses of income information released for statistical purposes are known to have occurred, the Tax Reform Act of 1976 sharply limited outside access to IRS income information for statistical purposes. The passage of this act and the subsequent restrictions on existing research practices illustrate how reforms directed at abuses of administrative records may inadvertently hamper statistical and research uses of such records.
Prior to 1976 a number of executive orders permitted outside agencies to use IRS income information if they followed certain standards in maintaining the confidentiality of the information. However, in 1976 the Congress, partly in response to the publicized disclosure of income tax information to the White House,
passed the Tax Reform Act of 1976 (see Wilson and Smith, 1983; U.S. Code Congressional and Administrative News, 1976:2897–4284, P.L. 94–455). This act denied tax return information to the President and other executive agencies (§ 6103(j) of the Internal Revenue Code) and expanded the definition of tax return information to include a broader range of Social Security earnings reports. As a consequence, access to income information for statistical and research purposes was sharply restricted.
The Internal Revenue Code, as amended by the Tax Reform Act of 1976, provides that tax returns and "return information" are confidential and not to be disclosed except as authorized by law (§ 6103(a)). "Return information" is extensively defined to include virtually every aspect of information filed with the IRS, including a taxpayer's identity and address and the fact that a return was filed (§ 6103(b)(2)). As a result, most releases of filing information to other federal agencies for purposes of statistical research must be in anonymous form. Exceptions are made for specified statistical and research uses by other units of the Treasury Department, the Census Bureau, the Bureau of Economic Analysis, and by a subsequent amendment (P.L. 95–210), the National Institute for Occupational Safety and Health (§ 6103(j) and § 6103(m)(3)). The IRS may undertake special statistical studies at the request of another agency, even merging IRS records with records from the other agencies (§ 6108). Such opportunities, however, may be limited by the resources and interests of the IRS, as well as the need of the requesting agencies for access to detailed and potentially identifiable data used in such studies.
A serious loss of important statistical products, as well as significant increased costs, was imposed on the federal statistical system as a consequence of the Tax Reform Act. For example, external uses of the Social Security Administration's Continuous Work History Sample were substantially curtailed, as was access to IRS name and address information for medical follow-up and epidemiologic studies (see Chapter 6 for details). And, as a result of the act, federal statistical agencies faced a new obstacle in their efforts to promote sharing of business lists (see Chapter 7).
AGENCIES LACKING EXPLICIT STATUTORY PROTECTION OF ESTABLISHMENT DATA
With few exceptions, the statutory protection of information on establishments and other organizations that is collected for statistical purposes is far weaker than the statutory protection of
personal data. Although we address the uses of statistical data on organizations in Chapter 7, we mention them here to provide a contrast with the regulatory schemes for personal records. Several agencies that collect statistical information primarily from establishments and organizations do not have statutes that explicitly protect the confidentiality of such information.
The experience of the Energy Information Administration in attempting to maintain the confidentiality of statistical information collected from oil companies illustrates the problem. At the time the data were gathered, the EIA assured the companies involved that the information would be used only for statistical reporting purposes. The EIA relied on its statutory authority to administer this statistical program to develop regulations that were intended to guard against administrative uses of such information (see the Department of Energy policy statement contained in the Federal Register, 45(177):59812–59816). However, conflicting statutory authority requires the disclosure of such information to other federal agencies for official use upon request. Specifically, Title 15 U.S.C. § 771(f) states that "information [referred to in section 1905 of Title 18] shall be disclosed by … the [secretary of energy], in a manner designed to preserve its confidentiality—(1) to other Federal Government departments, agencies, and officials for official use upon request…."
When the Antitrust Division of the Department of Justice sought identifiable information to aid in two investigations of price gouging, the EIA resisted and the matter was referred to the Office of Legal Counsel in the Department of Justice, which ruled that EIA must produce the information requested by the Antitrust Division. Protracted negotiations regarding the precise nature of the data to be released ensued; the Justice Department eventually closed the cases without having received any proprietary data. The Justice Department, however, continues to hold that it is legally entitled to such data (U.S. General Accounting Office, 1993; see Chapter 7 for additional detail).
The Bureau of Labor Statistics also collects extensive statistical information on establishments and other organizations, much of which is reported voluntarily, without specific statutory protection to preserve the confidentiality of identifiable information. However, BLS has been more successful than EIA in fashioning a protective shield for such information. Based on a combination of regulations and lower court decisions, BLS has developed policies that enable it to honor the pledges of confidentiality that are given when it collects establishment data.
The basic philosophy of BLS is to protect the confidentiality of the data that it collects. This fundamental position is articulated in a series of policy directives that set forth procedures for safeguarding sensitive BLS information. The secretary of labor has authorized the commissioner of BLS to develop policies to govern the disclosure of all data collected by BLS (U.S. Department of Labor, 1972). These policies are intended to ensure that "data collected or maintained by, or under the auspices of, the Bureau under a pledge of confidentiality shall be treated in a manner that will assure that individually identifiable data will be accessible only to authorized persons and will be used only for statistical purposes or for other purposes made known in advance to the respondent" (Bureau of Labor Statistics, 1980:6).
This pattern of regulation was upheld by a federal district court when employment data from unemployment insurance (UI) reports that would identify individual establishments were sought under the Freedom of Information Act (see Hufstead v. Norwood, 529 F. Supp. 323 (S. D. Fla. 1981)). The reports had been voluntarily supplied to BLS by a state employment security agency, which was required by state law to keep such reports confidential, and BLS had pledged to keep the reports confidential. The court refused to order disclosure, finding the information to be exempt from the disclosure provisions of the Freedom of Information Act because such disclosure "would impair the Bureau of Labor Statistics' ability to collect that data in the future." More specifically, the court found that BLS met the test for an exemption from disclosure under exemption 4 of the Freedom of Information Act (Title 5 U.S.C. § 552(b)(4)), established in National Parks and Conservation Ass'n v. Morton, 498 F.2d 765, 767 (D. C. Cir. 1974), 529 F. Supp. at 326. In addition, because each state has laws and regulations protecting the confidentiality of its UI data, BLS is given extra protection. That is, if UI data are requested, BLS cannot release the data without conforming to the states' laws and regulations.
While BLS has been successful in honoring pledges of confidentiality without formal statutory protection, it has had relatively few legal tests of its pledge. Will promises of confidentiality be honored if the secretary of labor should revoke the delegation of authority to BLS's commissioner and begin to use this information in enforcement proceedings, or share some of it with the Department of Justice for investigatory purposes? Without clear statutory protection, the adequacy of the protection for confidential information on business establishments may be constrained
by the policies of successive administrative officials and varying interpretations of courts of different jurisdiction.
The Bureau of Labor Statistics has, on occasion, sought statutory protection for the confidential statistical information that it collects. For instance, a letter from Secretary of Labor Elizabeth Dole to the Hon. Thomas Foley, Speaker of the House of Representatives, dated June 21, 1990, describes a proposed Labor Statistics Confidentiality Act. As of early 1993, however, no action had been taken on such legislation.
NEW STATISTICAL AGENCIES
Legislation passed late in 1991 established a new statistical agency, the Bureau of Transportation Statistics, in the Department of Transportation. Also in 1991, legislation was considered, but not passed, that would have created a Department of the Environment, with a Bureau of Environmental Statistics as one of its components. The roles of the two agencies, one already authorized and the other being considered, differ somewhat from the common conception of what a federal statistical agency does. Instead of their collecting data on relevant topics directly from persons or organizations, it is apparently intended that each of the two agencies will accomplish its mission primarily by compiling, analyzing, and publishing data collected by other components of its parent department.
The Bureau of Transportation Statistics was created by Section 6006 of the Intermodal Surface Transportation Efficiency Act of 1991 (P.L. 102–240). The only provision of the act that relates to confidentiality and data access is the following short paragraph:
(e) PROHIBITION ON CERTAIN DISCLOSURES.—Information compiled by the Bureau shall not be disclosed publicly in a manner that would reveal the personal identity of any individual, consistent with the Privacy Act of 1974 (5 U.S.C. 552a), or to reveal trade secrets or allow commercial or financial information provided by any person to be identified with such person (§ 111).
The above provision mentions only public disclosure. It does not establish a clear basis for full functional separation of data that might be received by the Bureau of Transportation Statistics from other statistical agencies or from administrative sources and used by it for statistical and research purposes. The Conference Report on the legislation goes somewhat further in this direction, stating that
the conferees intend that the Director establish such procedures as necessary to ensure that all Bureau data are collected and stored in such a way that they cannot be used to prosecute individuals or reveal business information that could harm persons or corporations (102nd Cong. 1st session, Conference Report 102–404:461–462).
At this time, it is difficult to predict what kinds of problems the Bureau of Transportation Statistics is likely to have in managing its confidentiality and data access functions given this less-than-comprehensive statutory basis.8
The key portion of the confidentiality provision drafted for the proposed Bureau of Environmental Statistics is a statement almost identical to the one cited above from the transportation legislation. One is left with the overall impression that in drafting confidentiality legislation for new statistical agencies the drafters have given relatively little attention to what can be learned from experience. What statutory provisions are needed to maintain the principle of functional separation of administrative and statistical or research data and to maximize the dissemination of useful data while protecting confidentiality? How can demands for access to individually identifiable statistical data for nonstatistical purposes be successfully resisted? How does the statutory language affect the agency's ability to share identifiable data with other federal agencies for statistical and research uses? A look at the experiences of the statistical agencies whose legislation we have described in this chapter (and the experiences of the Energy Information Administration described in Chapter 7) would have provided valuable insights into what has worked well and what has not.
FINDINGS AND RECOMMENDATIONS
Current legislation impedes the constructive exchange of data for statistical and research purposes while failing to provide adequate protection of the confidentiality of statistical and research records. The panel notes two major inadequacies of the current situation.
First, many agencies not covered by specific confidentiality legislation have to rely on the Privacy Act to protect identifiable data on individuals. There are serious limitations to the protections provided by the Privacy Act for data from individuals, however. The panel notes three related limitations to the Privacy Act:
Although the Privacy Act does provide for separate treatment and disclosure of anonymous statistical data, it does not distinguish between administrative uses of data and statistical uses. Thus, it does not provide for the separate treatment of identifiable statistical data and identifiable data from administrative record systems. Distinctions between statistical and administrative data correspond to the purpose for which the data were collected and the assurances offered to the respondent. (The panel strongly supports the principle of functional separation of statistical and administrative data; see discussion in Chapter 1.)
The Privacy Act does not prohibit administrative and regulatory uses of individually identifiable data obtained for statistical purposes. Exceptions to the informed consent requirement may permit statistical data to be used for administrative and regulatory purposes, in violation of assurances given the individual at the time the information was collected that the information would be used for the exclusive purpose of conducting research or statistical analysis.
The Privacy Act requires agencies to inform individuals of the anticipated routine uses that will be made of their data at the time the data are collected. However, the routine-use exemption permits exchange of identifiable information for an unanticipated purpose if that purpose is "consistent" with the original purposes mentioned when the information was collected. Loose standards for defining a consistent use may diminish the protection afforded statistical records by permitting the release of identifiable research records for administrative, enforcement, and other nonresearch purposes that may jeopardize the interests of data subjects (see Coles, 1991).
Second, there is wide variation among statistical agencies in the degree of confidentiality protection that is afforded by legislation. Among the agencies reviewed in this chapter, protection varies greatly, from the rigorous protection provided for data collected under Title 13 by the Census Bureau, to the currently uncertain protection of data collected by the Energy Information Administration, to the absence of specific statutory protection for the Bureau of Labor Statistics. The panel notes three major effects of this variation:
The degree of data protection often is determined by the statutes governing the specific agency maintaining the information. As a result, the same kind of information may receive greater
or lesser protection depending on the agency that maintains it, and without regard to the sensitivity of the information.
Variation in protection among agencies may thwart the exchange of information across agencies. Those agencies that are required by law to maintain an especially high degree of confidentiality are not able to transfer data to agencies that do not have the same level of protection. For example, the National Agricultural Statistics Service (NASS) transfers lists of farms to the Census Bureau for the latter's use in developing a mailing list for the Census of Agriculture. However, NASS cannot obtain the complete mailing list for the Census of Agriculture for use in developing a sampling frame for its own surveys. Instead, it must develop its own sampling frame at considerable additional cost, even though both agencies collect data from essentially the same universe.
Differing levels of statutory protection across agencies may cause confusion for data subjects and providers who do not pay close attention to subtle differences in the assurances of confidentiality that accompany agency requests for information. These assurances often are carefully drafted to correspond with the unique level of protection provided by each agency. But the distinctions may become blurred or lost when data providers respond to requests for information from several agencies. As a result, data providers may place too much trust in agencies with limited means of protecting data used for statistical and research purposes, and they may not know which agencies have a sound legal basis for offering more rigorous protection.
Recommendation 5.1 Statistical records across all federal agencies should be governed by a consistent set of statutes and regulations meeting standards for the maintenance of such records, including the following features of fair statistical information practices:
a definition of statistical data that incorporates the principle of functional separation as defined by the Privacy Protection Study Commission,
a guarantee of confidentiality for data,
a requirement of informed consent or informed choice when participation in a survey is voluntary,
a requirement of strict control on data dissemination,
a requirement to follow careful rules on disclosure limitation,
a provision that permits data sharing for statistical purposes under controlled conditions, and
legal sanctions for those who violate confidentiality requirements (see Recommendation 5.3 for further discussion of this requirement).
The panel believes that such legislation should be crafted so that it does not require renewal in short intervals. The confidentiality legislation for the National Agricultural Statistics Service, for example, is contained in the Food Security Act of 1985 (P.L. 99–198, § 1770), which must be renewed every five years. We believe that all statistical agencies should have permanent confidentiality legislation so as to meet the legitimate expectations of respondents for confidentiality.
The seven features listed above are intended to be an integrated package, not a shopping list of possible options. While the list contains the minimum standards, the panel does not wish to discourage the high standards on confidentiality that are already in place for some agencies. Existing and proposed legislation should balance such competing values as data protection and data dissemination.9
To attain the proposed standard for the maintenance of statistical records, the panel sees two complementary approaches:
The standards might take the form of new government-wide legislation, a "Federal Statistical Records Act," that would protect the confidentiality of records used for statistical purposes across the entire federal government. Such an act would (1) distinguish between administrative and statistical records on the basis of use, (2) establish and mandate functional separation between administrative and statistical records, (3) encourage disclosure of statistical records for statistical uses, and (4) restrict disclosure of statistical records for nonstatistical uses. The legislation would complement the Privacy Act and perhaps be incorporated in it. However, amending the Privacy Act, which extends protection only to individuals, will not be a suitable means of addressing concerns regarding statistical data maintained by federal agencies on establishments and organizations. (See Chapter 7 for the panel's recommendations concerning statistical data on organizations.)
A second approach would be to incorporate the standards in agency-specific confidentiality legislation, adapting them to the specific mission and programs of each statistical agency. For this approach, the panel suggests that specialized legislative provisions be developed that apply the standards outlined above to the specific
tasks of each statistical agency. Thus, there would be two levels of protection: first, the Federal Statistics Records Act and, then, specific legislation governing each agency.
These approaches are not mutually exclusive, and some combination of strategies may be appropriate. For instance, the Privacy Act could be amended while model legislative provisions were being developed.
Statutory standards that prohibit disclosure of statistical data if there exists any possibility of association with a specific individual or organization place an unnecessarily strict limitation on exchange of data for research. Although statutory requirements for nondisclosure (other than those in government-wide statutes such as the Privacy Act of 1974) vary considerably from one agency to another, most are stated in absolute terms to the effect that no information can be released if it can be associated with a specific individual or organization. For example, the Census Bureau's Title 13 prohibits the publication of any data from establishments or individuals collected under this statute if the data can be identified. Similarly, legislation governing the National Center for Education Statistics prohibits the publication of data that would permit the identification of an individual.
If taken literally, requirements of "zero disclosure risk" would virtually eliminate the dissemination of statistical data, especially in the form of public-use microdata. Removal of explicit identifiers (such as name, address, and Social Security number) and careful application of statistical techniques to protect confidentiality (see Chapter 6) are not enough to protect individually identifiable data from ever becoming known. There is almost always some residual risk, which cannot be easily quantified, that a diligent "snooper" who wanted to do so might identify one or more data subjects included in a tabulation or microdata file.
In practice, many of the statutory requirements that are expressed in absolute terms have been interpreted to mean that unrestricted access to a particular data set requires that a reasonable effort be made to keep the risk of disclosing individually identifiable information at an acceptably low level. Agency officials exercise judgments to decide what statistical disclosure limitation procedures must be applied in order to keep disclosure risk acceptably low. Judgments are required because no one can have full knowledge of the externally available data sets and other factors that might lead to disclosure and because the concept of what constitutes an acceptably low risk of disclosure is inherently
subjective. Also, the incentive for "snooping" will vary greatly across different data sets, thus affecting risk.
The implementing regulations for the Privacy Act allow federally supported researchers to disseminate project findings that do not contain information that could reasonably be expected to be identifiable. A criterion of a "reasonably low disclosure risk" has been used by OMB in its interpretation of the Privacy Act. In providing executive branch agencies with guidelines for implementing the Privacy Act, OMB defines the phrase "not individually identifiable" to mean that "the identity of the individual can not reasonably be deduced by anyone from tabulations or other presentations of the information" (Federal Register, 40(132):28954).
Recommendation 5.2 Zero-risk requirements for disclosure of statistical records are, in practice, impossibly high standards. Regulations and policies under existing statutes should establish standards of reasonable care. New statutes should recognize that almost all uses of information entail some risk of disclosure and should allow release of information for legitimate statistical purposes that entail a reasonably low risk of disclosure of individually identifiable data.
Few sanctions for improper dissemination of research records are designed to extend beyond federal agencies to contractors, grantees, and nonagency researchers. In most of the statistical agencies the panel reviewed, employees take an oath or sign an affidavit to indicate that they are aware of the statutory confidentiality provisions associated with the data and the penalties for violating those provisions.
Among the federal statistical agencies included in our review, only the National Center for Education Statistics and the Bureau of Justice Statistics have legislation that includes penalties for violation of confidentiality provisions by nonemployees. For NCES, such penalties are for data users who have access to data from a subset of surveys that were listed in the 1990 amendment to the Hawkins-Stafford amendments (see § 252 of the Excellence in Mathematics, Science and Engineering Act of 1990, P.L. 101–589). The statute for BJS is much broader—contractors and grantees are subject to a maximum fine of $10,000 for violating the confidentiality provisions of BJS's legislation (42 U.S.C. § 3789g(a)).
Legislative sanctions for violating confidentiality provisions are desirable for all data users, within and outside a federal agency.
Currently, some penalties for violating confidentiality have the force of law, but others do not. Informal penalties, such as declining to provide additional data to users who fail to observe agency confidentiality standards, are among the few options currently available to agencies that do not have statutory provisions for penalizing such users. Such practices could be effective for users whose statistical activities require frequent use of federal data sets. However, they would be unlikely to deter those who want to identify individual units in a data set merely to satisfy their curiosity or to embarrass the agency that released the data.
Legislative sanctions and informal penalties for violating confidentiality have not been widely applied. This could be due to the fact that there have been few violations, that violations have occurred without agencies' knowledge, or that agencies have been reluctant to impose sanctions for violations due to the adverse publicity that might result. Nevertheless, it is clear that sanctions authorized by legislation can play an important role in protecting the confidentiality of statistical data by raising the consciousness of data custodians and data users.
As noted in Chapter 4, data users are pressing agencies to provide greater access to statistical data. Currently, federal agencies are the ones who "suffer" the consequences when confidentiality violations occur. They fear such negative outcomes as lower response rates or decreased cooperation from respondents and are understandably cautious about broadening access to confidential statistical data since it merely increases the risk for them. Nonemployees who are data users must also share responsibility for maintaining the confidentiality of data. Data users must recognize their responsibilities and data providers need the assurance that sanctions bring to users' responsibilities.
Recommendation 5.3 There should be legal sanctions for all users, both external users and agency employees, who violate requirements to maintain the confidentiality of data.
This recommendation is tied to the panel's position discussed in Chapter 4, that is, that users must be prepared to accept legal and contractual sanctions for violations.
Contractual sanctions for violating confidentiality might also be broadened to include the data user's organization. One enforcement device that has been used in federally supported survey research is the requirement that the user's organization deposit a specified sum of money before the data user can obtain the microdata.
The money is forfeited if the data user fails to live up to the specified provisions of the agreement on data release. This approach has been used by the University of Michigan's Survey Research Center in releasing detailed microdata files from longitudinal surveys conducted with grant support from the National Science Foundation (see Jabine, 1993a).
This discussion of the Privacy Act is taken, in part, from Cecil (1989).
For a more detailed discussion of the origins of the Privacy Act, as well as a discussion of the way in which it is inappropriate to contemporary recordkeeping practices, see Flaherty, 1989:306-314, 366-370.
The Privacy Act defines a record as "any item, collection, or grouping of information about an individual that is maintained by an agency, including, but not limited to, his education, financial transactions, medical history, and criminal or employment history and that contains his name, or the identifying number, symbol, or other identifying particular assigned to the individual, such as a finger or voice print or a photograph" (Title 5 U.S.C § 552a(a)(4)).
The term statistical record is defined in the Privacy Act as "a record in a system of records maintained for statistical research or reporting purposes only and not used in whole or in part in making any determination about an identified individual, except as provided by § 8 of Title 13 (which governs the Census Bureau)" (Title 5 U.S.C. § 552a(a)(6)).
The Privacy Act defines a system of records as "a group of any records under the control of any agency from which information is retrieved by the name of the individual or by some identifying number, symbol, or other identifying particular assigned to the individual" (Title 5 U.S.C. § 552a(a)(5)).
For an example of the differing standards that have been adopted under this exemption for disclosure of names and addresses contained in files held by federal agencies, see Rubin (1990). For a discussion of the evolving nature of this exemption, see Andrussier (1991).
See "Requirements Relating to Confidentiality and Privacy in Data Collection Contracts" and Requirements Relating to Confidentiality and Privacy in Data Processing Contracts," Appendixes A and B, respectively, in National Center for Health Statistics (1984).
The cases are The Atlanta Journal et al. v. Ledbetter et al., Superior Court of Fulton County, State of Georgia, Civil Action File No. D-40588 (1987) and Jane Doe II et al. v. Lumpkin, United States District Court, Central District of Illinois, Case No. 89-1224 (1991).
One of the requirements of Title 13 is that the Census Bureau not