National Academies Press: OpenBook

Protecting Participants and Facilitating Social and Behavioral Sciences Research (2003)

Chapter: 5 Enhancing Confidentiality Protection

« Previous: 4 Enhancing Informed Consent
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 113
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 114
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 115
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 116
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 117
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 118
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 119
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 120
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 121
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 122
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 123
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 124
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 125
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 126
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 127
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 128
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 129
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 130
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 131
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 132
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 133
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 134
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 135
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 136
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 137
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 138
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 139
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 140
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 141
Suggested Citation:"5 Enhancing Confidentiality Protection." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 142

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

—5— Enhancing Confidentiality Protection B REACH OF CONFIDENTIALITY, that is, the release of data that permit identifying an individual participant, is often the major source of potential harm to participants in social, behavioral, and economic sciences (SBES) research (see Sieber, 2001). For exam- ple, a survey that poses no risk of physical injury and no more than minor psychological annoyance to a respondent may nonetheless ob- tain data that could harm the respondent if others outside the research team (e.g., neighbors, co-workers, public agency officials) could asso- ciate those data with the person. Such information, if known by others, might affect employment, insurability, personal relationships, civil or criminal liabilities, or other activities or situations. In some cases, the simple fact of learning that an individual is a study participant could be harmful (e.g., if police or drug dealers were to learn the names of participants in an ethnographic study of drug markets). Furthermore, if a participant has been assured of confidentiality, then disclosure of identifiable information about the person is a violation of the principle of respect for persons even if the information is not sensitive and would not result in any social, economic, legal, or other harm. Protection of confidentiality is a concern in SBES research when- ever data are collected in identifiable form. Identifiers include not only such overt information as name, address, social security number, telephone number, and e-mail address, but also detailed information about the respondent, such as income and profession, that could per- mit identification by inference in the absence of an explicit identifier.1 Some SBES research does not collect identifiable information in these terms—for example, observational studies of street-crossing behavior of people who are not photographed or approached by the investigator in any way. However, for much SBES research, confidentiality protec- tion is a necessary and vitally important component of the study plan. 1 Even the assignment of arbitrary identifiers may not protect against re-identification so long as the link between the arbitrary codes and originally collected real identifiers (e.g., name) has not been destroyed. 113

114 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Breach of confidentiality can occur at any stage of a research project—data collection (including recruitment of participants), pro- cessing, storage, and dissemination for secondary use. At the present time, the risk of disclosures that could be embarrassing or damaging to participants (or that could simply violate a pledge of confidentiality) is increasing because of several factors. Most of these factors affect the disclosure risk for dissemination for secondary use, but some also have implications for the disclosure risk as a result of data collection, processing, and storage. They include the following: • There are growing numbers and variety of publicly available mi- crodata files for secondary analysis. Such files provide informa- tion on individuals that have been stripped of obvious (and less obvious) identifiers. Increasingly, microdata sets contain richly detailed content from multiple observations on the same individ- uals over time (panel surveys), or they contain data on more than one type of entity (e.g., education surveys of students, their par- ents, teachers, and schools), or they contain both kinds of data. Such rich data sets increase the potential for re-identification of respondents through linkages with other data sources. Panel sur- veys also pose disclosure risks as a result of data storage because contact information must be retained for respondents for months or years. Generally, disclosure risks for panel surveys increase over time. • There are growing numbers, variety, and content of administra- tive records data sets from public and private agencies (e.g., birth and death records) that are readily available on the Internet. Such files can potentially be linked to research data sets and used to re- identify research respondents. • More broadly, the capabilities to link information across multiple sources on the Internet are increasing. • There is increased emphasis by funding agencies on data shar- ing among researchers, not only to permit replication of results, but also to foster additional research at low marginal cost. Such sharing has many benefits, but it also multiplies the number of people with access to the data. • The speed of data processing and volume of low-cost data storage are increasing, which facilitates efforts to link data sets. • There is increased use of data collection technologies, such as web surveys, and data transmission methods, such as e-mail and file-sharing procedures, that may not be secure.

ENHANCING CONFIDENTIALITY PROTECTION 115 In this chapter we provide historical background on confidentiality protection for research data in the United States, beginning with the attention given to protection issues by the U.S. Department of Health and Human Services (DHHS) and the institutional review board (IRB) system. We continue with the history of legislative protection for data collected by the Census Bureau and other federal statistical agencies that are widely used by SBES researchers and others. (For decennial census data, legislative protection goes back to the 1920s.) Until fairly recently, the activities of IRBs and statistical agencies with regard to confidentiality protection have proceeded largely independently. Next, we provide a fuller explication of the factors that are challeng- ing the adequacy of confidentiality protection measures today and the techniques and procedures that statistical agencies are adopting in re- sponse. Our recommendations to IRBs, the Office for Human Research Protections (OHRP), and research funding agencies for enhancing con- fidentiality protection for different kinds of SBES research follow. To protect participants and facilitate research with existing data, we pro- pose a new system for certifying the confidentiality of data files, built on existing and new data archives in the United States. HISTORY OF CONFIDENTIALITY PROTECTION IN THE PARTICIPANT PROTECTION SYSTEM Common Rule and IRB Operations Surprisingly, the history of human research participant protection policies and regulations shows relatively little attention to issues of con- fidentiality protection.2 Although the 1966 U.S. Public Health Service policy statement and the 1971 “Yellow Book” guidelines3 for partici- pant protection mentioned the need to protect confidentiality, the 1974 regulations (45 CFR 46) did not require IRBs to determine that study plans adequately address confidentiality issues. Indeed, papers and testimony from social scientists prepared for the 1974-1978 National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research commented that existing regulations did not ade- quately address issues of confidentiality in SBES research. A provision 2 By “confidentiality,” we mean protecting private information from being revealed to others in a way that could identify an individual research participant. Such protection is distinct from “privacy,” by which we mean the right of an individual to decide whether to share information with the investigator in the first place (e.g., a survey participant could refuse to answer certain questions on grounds that they invaded his or her privacy; see National Research Council, 1993:22-23). 3 The “Yellow Book” was the name given to “The Institutional Guide to DHEW Policy on Protection of Human Subjects” (see Chapter 3).

116 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH on confidentiality was added to the 1981 version of the regulations (45 CFR 46.111a): it required IRBs to determine “where appropriate, there are adequate provisions to protect the privacy of subjects and to main- tain the confidentiality of data.” The 1981 regulations also specified “a statement describing the extent, if any, to which confidentiality of records identifying the subject will be maintained” as one of the ba- sic elements of informed consent (45 CFR 46.116a). Beyond these two references, however, the Common Rule provides no guidance, even on traditional confidentiality protections for laboratory, survey, ethno- graphic, and other originally collected data, such as assigning new identifiers and destroying the link to the original identifiers, keeping data records in locked files, and the like.4 Guidance in the IRB Guide- book (Office for Protection from Research Risks, 1993:Ch.III.D) is very general. With regard to IRB attention to confidentiality protection in review- ing protocols, the 1975 Michigan survey found that confidentiality is- sues were relatively rare as a focus of IRB review: only 3 percent of protocols were required to modify their confidentiality procedures; in comparison, 24 percent of protocols were required to modify their con- sent forms or procedures (Gray, Cooke, and Tannenbaum, 1978:Table 2). This difference may be understandable given that the challenges to maintaining confidentiality were not as great then as they are to- day. However, the 1995 Bell survey 20 years later reported similar results: only 3 percent of IRB chairs said that inadequate confidential- ity protections were often a problem with the research protocols they reviewed, while problems with consent forms were cited frequently (Bell, Whiton, and Connelly, 1998:Figure 40). Similarly, only 14 per- cent of investigators reported being required to modify their proce- dures for protection of privacy and confidentiality, compared with 78 percent who reported being required to modify their consent forms (Bell, Whiton, and Connelly, 1998:Figure 41). The continued relative lack of emphasis on confidentiality protec- tion may result from determinations by IRBs that proposed protection procedures are adequate. It may also result from continued underesti- mation by IRBs of the risks of disclosure, which today’s research and computing environment has heightened. 4 See Sieber (2001) for a critique of the Common Rule’s limited statements on confi- dentiality, which she asserts do not properly recognize the distinction between confiden- tiality and privacy.

ENHANCING CONFIDENTIALITY PROTECTION 117 Confidentiality Certificates Another initiative by federal research funding agencies to protect confidentiality is the long-standing program of the National Institutes of Health (NIH) whereby researchers may obtain certificates of con- fidentiality for research on sensitive topics, whether the research is funded by NIH or another agency. The National Institute of Justice also makes such certificates available for criminal justice research. These certificates buttress confidentiality protection in specific circum- stances—namely, they protect researchers from being compelled to de- liver names or identifying characteristics of participants in response to court orders or subpoenas, unless respondents have consented to such release.5 Qualifying studies include those that collect data on such top- ics as sexual attitudes, preferences or practices; use of alcohol, drugs, or other addictive products; mental health; genetic makeup; illegal conduct; or other topics for which the release of identifiable informa- tion might damage an individual’s financial standing, employability, or reputation within the community or might lead to social stigmatization or discrimination. At present, however, the protection afforded by such certificates is prospective; that is, researchers cannot obtain protection for study results after data collection has been completed, and it is not always obvious in advance when a certificate may be needed. Medical Records Protection The Health Insurance Portability and Accountability Act (HIPAA) of 1996 contained a provision that has resulted in the latest initiative by federal research funding agencies to protect confidentiality. HIPAA promotes the use of standard formats for electronic information ex- change to simplify the administration of health insurance payments for medical treatment. Recognizing a potential threat to the confiden- tiality of patient records, HIPAA required DHHS to submit to Congress detailed recommendations on privacy standards for individually identi- fiable health information. This short provision led to the Privacy Rule, which comprises hundreds of pages of regulations and commentary; it is scheduled to take full effect in April 2003 (see Gunn et al., 2002; Institute of Medicine, 2002:205-211). The version of the Privacy Rule issued by DHHS in December 2000 drew substantial criticism from the health care community, including researchers, who complained that the provisions for research access 5 The New York Court of Appeals upheld the authority of confidentiality certificates in 1973 (for more information, see http://grants1.nih.gov/grants/policy/coc [4/10/03]).

118 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH to health information were confusing and unnecessarily restrictive. In response, DHHS published a modified Privacy Rule in August 2002. The Privacy Rule applies to health plans, health care clearinghouses, and health care providers (covered entities) who maintain patient and claims records; it also affects health care researchers who obtain such records from covered entities for analysis purposes. Under the rule, covered entities may make “de-identified” data available for research use, without patient authorization, in one of three ways. First, a cov- ered entity may release a “limited data set,” consisting of patient and claims records stripped of a list of direct identifiers of the individual, relatives, household members, and employers, to researchers who sign a legally binding agreement to safeguard and not disclose the infor- mation. The identifiers that must be deleted include names, street addresses, telephone numbers, e-mail addresses, social security num- bers, medical record and health plan account numbers, device identi- fiers, license numbers, vehicle identifiers, full face photos, and finger and voice prints (Gunn et al., 2002:8). However, birth date, 9-digit zip code, and dates of admission and discharge are permissible to include in such a data set. Second, a covered entity may release a “de-identified” data set for research use without requiring the researcher to sign an agreement provided a more comprehensive list of identifiers has been removed. Third, a covered entity may employ a statistician to attest that the risk of re-identification is very small because of the nature of the data (e.g., in cases when the data have been subject to statistical manipulation— see “Protection Methods of Statistical Agencies,” below). The Privacy Rule also provides that IRBs or Privacy Boards may issue waivers for research access to data when the research cannot be conducted with de-identified data and when it is not practicable to obtain authorization from research participants.6 The waiver re- quirements were initially criticized as being inconsistent with the Com- mon Rule; they were simplified and rewritten for consistency. They require that adequate plans are in place to protect identifiers from dis- closure and to destroy them at the earliest opportunity. IRBs and Pri- vacy Boards may also allow researchers limited access to identifiable records data in order to identify and recruit prospective participants or to conduct preliminary exploratory research to determine the feasi- bility of a full-fledged analysis. We cannot do justice to the Privacy Rule provisions in this brief summary nor anticipate how they may work in practice. We note 6 It is apparently expected that IRBs would handle waivers under the Privacy Rule for federally funded research and that Privacy Boards would handle waivers for other research, although this is not clear (see Institute of Medicine, 2002:209).

ENHANCING CONFIDENTIALITY PROTECTION 119 that these provisions increase the necessity for IRBs, OHRP and re- , searchers to become cognizant of good practices for confidentiality protection, as discussed below. CONFIDENTIALITY PROTECTION IN THE FEDERAL STATISTICAL SYSTEM Census Bureau History 1790 to World War II The history of confidentiality protection for federal statistical data begins, as for so many data collection and dissemination issues, with the decennial census—first conducted in 1790 pursuant to the U.S. Constitution.7 In the first few decades of the census, the returns were posted in public places for public review. By the middle of the 19th century, the Congress and census directors began to worry about enu- merators improperly revealing information and possibly gaining some private benefit. Public posting was discontinued, and enumerators were instructed to keep census information confidential. Yet federal, state, and local agencies and courts not infrequently attempted to ob- tain individual census returns. Most often the Census Bureau rebuffed these requests, but sometimes it acceded to them. Finally, Public Law 13 (Title 13 of the U.S. Code) was enacted in 1929 to codify various practices that had been emerging in official U.S. statistics. Section 9 explicitly provided for the confidentiality of economic and population census data: The information furnished under the provisions of this Act shall be used solely for the statistical purposes for which it is supplied. No publication shall be made by the Census Office whereby the data furnished by any particular establishment or individual can be identified, nor shall the Director of the Census permit anyone other than the sworn employees of the Census Office to examine the individual reports. Another section provided heavy penalties, which currently include large fines and up to 5 years’ imprisonment, for Census Bureau employees who breach confidentiality. Title 13 also covers household surveys con- ducted by the Bureau that use the census address list as their sampling frame. 7 This history of confidentiality protection for the U.S. census draws heavily on Gates (2000) and Seltzer and Anderson (2002).

120 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH The enactment of Public Law 13 was timely because the Census Bureau was publishing more and more tabulations for smaller and smaller geographic areas, which required careful specification and re- view to minimize the risk of individual identification. As early as 1910, the Bureau published data for census tracts (locally delineated neigh- borhoods) in selected cities that paid for the tabulations. By 1940 the Census Bureau was coding and publishing census tract data for 64 cities. Also in 1940 the Bureau introduced a program of statistics for individual blocks in 191 cities. World War II and Later At the outbreak of World War II in 1939, the U.S. Attorney Gen- eral sought legislation to amend Title 13 to allow military and intelli- gence agencies to have access to individual census records. The Cen- sus Bureau adamantly opposed the legislation, and it was withdrawn. However, in June 1941 a newly appointed Census Bureau director, J.C. Capt, obtained the support of the Commerce Department for legisla- tion to authorize periodic surveys for national defense needs and to make census reports for individuals available for use in the “national defense program” with the approval of the president. This legislation passed the Senate in August 1941 with an accompanying report (77th Congress 1st session, Senate Rept 495, June 26, 1941, to accompany S 1627) that said: The needs of the defense program are of such a character as to require full and direct information about specific individ- uals and business establishments. . . . To continue to impose the rigid provisions of the present confidential use law of the Census Bureau. . . would defeat the primary objects of the legislation here proposed. The Senate legislation did not pass the House, but the Second War Powers Act, enacted March 27, 1942, effectively incorporated its provi- sions. This act provided that any Department of Commerce data could be provided to any federal agency at the written request of the agency head. It is not known whether individual census reports were ever pro- vided to people other than sworn Census Bureau employees. However, census tract-level tabulations of Japanese Americans from the 1940 census were provided to the Office of Naval Intelligence, and maps of city blocks with counts of Japanese Americans were provided to the Western Defense Command of the War Department, which facilitated internment of legal residents of Japanese origin.

ENHANCING CONFIDENTIALITY PROTECTION 121 The relevant section of the Second War Powers Act was repealed as part of the First Decontrol Act of 1947. In 1947 the Census Bu- reau refused a request by the Attorney General for census information on individuals who were suspected of being communist sympathizers. Since that time, the Bureau has an unblemished record of protecting confidentiality for the data it collects from respondents to censuses and surveys, despite the increasing challenges it faces to such protection.8 Its standing Disclosure Review Board reviews every data product the Bureau makes available for public use to ensure that disclosure risks are minimized. Other Statistical Agencies All federal statistical agencies operate under strong norms to pro- tect the data they collect against disclosure that could identify an in- dividual.9 Some agencies have legal protection against requests from administrative agencies and other bodies to disclose individually iden- tified information. However, other agencies have had to rely on exec- utive orders, court cases, and long-established custom (see Norwood, 1995). For years, the Statistical Policy Division of the U.S. Office of Man- agement and Budget (OMB) endeavored to obtain legislation that would strengthen the statutory basis for protecting the confidentiality of all federal data collected for statistical purposes under a confidentiality pledge. These efforts achieved success when, in November 2002, Congress enacted the E-Government Act of 2002. Title V, the “Confi- dential Information Protection and Statistical Efficiency Act of 2002,” subtitle A, places strict limits on the disclosure of individually identi- fied information collected under a pledge of confidentiality: such dis- closure can occur only with the informed consent of the respondent and the authorization of the agency head and only when the disclosure is not prohibited by any other law (e.g., Title 13). Subtitle A also pro- vides penalties for employees who unlawfully disclose information (up to 5 years in prison, up to $250,000 in fines, or both). However, even though confidentiality protection for statistical data is now on a much firmer legal footing across the federal government, a loophole may exist for data from the National Center for Education 8 See Gates (2000) for a summary of post-World War II changes in legislation and court decisions that have upheld the confidentiality protections of Title 13. A legal ex- ception to Title 13 is the provision in Title 44 that allows the National Archives to obtain individually identified census records and make them available for research use 72 years after the census date. 9 See Principles and Practices for a Federal Statistical Agency (National Research Council, 2000b).

122 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Statistics (NCES). NCES has for many years had strong statutory pro- tection for maintaining the confidentiality of its data and stiff penalties for NCES staff who breach confidentiality. The USA Patriot Act of 2001, enacted in October 2001 following the tragic terrorist acts of Septem- ber 11, may have vitiated the legal protections for NCES data. Section 508 of the act amended the National Center for Education Statistics Act of 1994 by allowing the Attorney General (or an assistant attorney general) to apply to a court to obtain any “reports, records, and infor- mation (including individually identifiable information) in the posses- sion” of NCES that are considered relevant to an authorized investiga- tion or prosecution of domestic or international terrorism. Section 508 also removed the penalties for NCES employees who furnish individual records under this section. To date, no requests for such records have been made, but NCES is revising the information it provides to survey respondents about the possibility that their data could be obtained un- der this act. It is not yet clear whether the confidentiality protections in the E-Government Act would take precedence over Section 508 of the Patriot Act. Federal Statistical Agencies and IRBs Most but not all cabinet departments that house federal statistical agencies have formally adopted the Common Rule (exceptions are the U.S. Departments of Labor and Treasury), and agency IRBs review proposed surveys for many statistical agencies. For example, the Na- tional Center for Health Statistics has an IRB, and the IRB for the Department of Education reviews NCES surveys. The Census Bureau, in contrast, does not obtain IRB review on the basis that its surveys are exempt under 45 CFR 46.101(b)(3)(ii). That provision exempts re- search from IRB review when federal law, as in Title 13, requires “with- out exception that the confidentiality of the personally identifiable in- formation will be maintained throughout the research and thereafter.” Yet there are features of some Census Bureau surveys that might be viewed as requiring IRB review (e.g., the appropriateness of providing financial incentives only to cases that otherwise refuse to participate in the Survey of Program Dynamics). All federal surveys are subject to clearance by OMB under the provi- sions of the Paperwork Reduction Act. This review covers not only sur- vey costs and burden for respondents, but also such issues as whether respondents are adequately informed about the purpose of the survey, the use of the information, whether response is voluntary or manda- tory, and the nature and extent of confidentiality protection. We are not in a position to recommend whether IRB review is needed in ad-

ENHANCING CONFIDENTIALITY PROTECTION 123 dition to OMB review, but we do suggest it might be useful for OMB and OHRP to discuss their respective jurisdictions. We note that statis- tical agencies have encountered some of the same problems as SBES researchers with IRB review, such as insistence on requiring signed written consent for minimal-risk surveys when evidence indicates that a signature requirement will deter response from some people who would otherwise be willing to participate (see Chapter 4). PROTECTING CONFIDENTIALITY TODAY Increasing Challenges The development of new data collection and dissemination tech- nologies is arguably the principal factor increasing disclosure risks for research data that are made available by federal statistical agencies and other providers today. Other factors that play a role include in- creases in the volume and richness of the data collected (in turn made possible by technological advances) and changes in the nature of SBES research, which increasingly involves secondary analysis of data col- lected by others and sharing of data for validation purposes. New Technology Collection and processing technology for large-scale data collection efforts has been under almost continuous development since at least the end of the 19th century, when Herman Hollerith (then a Census Bu- reau employee, later, the founder of IBM) invented a punch-card tabu- lation machine to edit and tabulate the 1890 census (see Salvo, 2000). At that time and for many years thereafter, the limitations of printing technology constrained the amount of tabulations that the Census Bu- reau and other agencies could publish for research use, thereby mini- mizing disclosure risk. The challenges of protecting data confidentiality began increasing in the 1960s when the Census Bureau first took advantage of comput- erization to greatly expand the volume and kinds of data it made avail- able to the user community. From the 1960 census (the first to be pro- cessed wholly by computer), the Bureau provided summary files (SFs) for small geographic areas on a reimbursable basis to several business firms. The tabulations on these computer files were much more exten- sive than those in printed reports. In 1963 the Bureau, with support from the Population Research Council, developed the first public-use microdata sample or PUMS file, which contained 1960 census individ- ual records for 180,000 people (a 1-in-1,000 sample of the U.S. popu-

124 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH lation).10 In the 1970 and subsequent censuses the Bureau greatly ex- panded the SF and PUMS programs, and, beginning in the late 1970s, the Bureau and other statistical agencies provided public-use micro- data files for an increasing number of large household surveys, includ- ing the Consumer Expenditure Survey, Current Population Survey, and Health Interview Survey. Yet throughout the 1970s threats to confidentiality were lessened by the small number of secondary users and the difficulties of acquir- ing and working with large computer files from the Census Bureau and other statistical agencies. The files were generally expensive to ac- quire (even though users only had to pay the costs of reproduction); they were also expensive to process, requiring programming support, mainframe computer hardware, and, often, investment in customized software. Users also required considerable training in how to analyze and interpret the data. Hence, the barriers to use were high. The spread of personal computing in the 1980s and 1990s greatly expanded the number of users who conducted secondary analyses of summary and microdata files from statistical agencies and other sources. However, at least initially, the storage capacity and processing speeds of personal computers were limited, thereby limiting the ability for linkage across data sets or other types of data manipulation that might breach confidentiality. In the 1990s, the emergence of the world wide web, together with vastly increased computing power and storage capacity of personal computers (often networked to provide yet more capacity), began to markedly increase the potential for breaches of confidentiality to occur. Multiple data sets were made available on the web, including summary and microdata files from statistical agencies and records of various types from public and private agencies. The volume of easily acces- sible data, together with sophisticated matching software, increased the likelihood that a determined investigator might re-identify a survey respondent despite the best efforts of individual agencies to minimize disclosure risk. Paralleling developments in data processing and dissemination tech- nology were developments in technology for data collection from sur- vey respondents. Computer-assisted telephone interviewing (CATI) came into use beginning in the 1980s, followed by computer-assisted personal interviewing (CAPI) in the 1990s. CAPI technology, in par- ticular, in which interviewers record responses on laptop computers 10 We define a public-use microdata file as a computer-readable file that contains indi- vidual records for a sample of individuals or households, is intended for research use, is available to any user, and has been processed to minimize the risk of identifying a partic- ular individual by using widely recognized good practices for confidentiality protection.

ENHANCING CONFIDENTIALITY PROTECTION 125 in the field and transmit the data over telephone lines to agency head- quarters, posed new problems of protecting confidentiality at the stage of interviewing and data transmission. Most recently, survey organi- zations and individual researchers have experimented with collecting responses on the Internet, which poses yet more challenges for confi- dentiality protection. Data Richness The kinds of technological developments discussed above, including faster processors, more storage capacity, and more sophisticated data processing and analysis software, made possible spectacular growth over the past three decades in the volume and richness of data sets that are available for secondary analysis by SBES researchers. This growth was also fueled by increasing demands for applied research in such areas as health services, retirement behavior, education, work and welfare, which have been the focus of public attention and policy debate. In terms of sheer volume of observations, PUMS files containing microdata from the decennial census long-form sample expanded over this period from a 1-in-1,000 sample of the population in 1960, total- ing more than 180,000 records, to as large as a 1-in-20 sample of the population in 2000, totaling more than 10 million records. Microdata files from the major household surveys of statistical agencies are also large and complex: for example, the Current Population Survey March Income Supplement contains data for more than 70,000 households and 180,000 people with detailed information on employment, family income, and household composition. Even more exciting from the perspective of SBES research has been the development of complex longitudinal, multilevel surveys that pro- vide a vast array of information to support in-depth secondary analysis of specific populations. Some of the major surveys of this type are the Health and Retirement Survey, National Longitudinal Survey of Ma- ture Women, National Longitudinal Survey of Youth, Panel Study of Income Dynamics, Survey of Income and Program Participation, and Survey of Program Dynamics. To illustrate the breadth and depth of in- formation such surveys can contain, Box 5-1 summarizes the structure and content of the Health and Retirement Survey, which is conducted by the University of Michigan Survey Research Center with a grant from the National Institute on Aging.

126 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH BOX 5-1 Health and Retirement Survey Design and Content Design The first cohort began in 1992 with 12,654 men and women aged 51-61; the second cohort began in 1998 with a smaller sample size. For both cohorts, they are interviewed every 2 years (spouses are also interviewed). New cohorts are to be introduced every 5 years. Linkages are performed or planned with Medicare records; social security earnings records; National Death Index; employer health plans; and employer pension plans (summary plan descriptions). Content (First Cohort) Not all questions were necessarily asked in every interview. The questionnaire also includes job history, income, and demographic characteristics, in addition to the topics listed below. Retirement-Related Expectations (for employed people) Probability of being laid off in next year Probability of finding an equally good job if laid off Whether would accept move to another state or a layoff Probability of working full-time after age 62, after age 65 Probability that health will limit activity during next 10 years Expect real earnings to go up, down, or stay the same in next few years Retirement plans: whether expect to retire completely, never stop work, work fewer hours, change kind of work, work for oneself, haven’t thought about it How much personal savings expect to have accumulated by time retire Whether and how much expect living standards to change after retirement Other Probabilities, Expectations Whether expect to have to give major financial help to family members in next 10 years Whether will live to age 75 or more, age 85 or more Whether housing prices in neighborhood will go up faster than prices in general over next 10 years Whether Congress will make social security more or less generous Whether U.S. will experience major depression in next 10 years Whether U.S. will experience double-digit inflation in next 10 years When expect to receive social security, how much in today’s dollars, ever had SSA calculate benefits Looking 2 years ahead, whether expect to be better or worse off financially Risk Aversion, Time Preference Whether would take another job with 50-50 chance it would double family income or cut by a third; with 50-50 chance it would double family income or cut in half; with 50-50 chance it would double family income or cut by 20 percent. In deciding how much to spend or save, which time period is most important: next few months, next year, next few years, next 5-10 years, longer than 10 years Attitudes Toward Bequests Importance of leaving a bequest Whether expect to leave a sizable bequest

ENHANCING CONFIDENTIALITY PROTECTION 127 BOX 5-1 (continued) Self-Reported Pension Coverage on Current Job (Similar questions are posed for the previous job if respondent is not working and for each job in job history section) If participating, for each plan, whether defined benefit or defined contribution or combination For each defined contribution, type of plan, how much accumulated, how much employer contributes, how much respondent contributes, how many years in plan in total, whether can choose how money is invested and whether mostly stock or interest-earning assets or evenly split, whether can receive lump sum or installments, youngest age when could start receiving benefits, what age expect to receive benefits and in what form For each defined benefit, age for full benefits and how much, expected earnings at full retirement age with this employer, age for reduced benefits and how much benefits would be reduced, whether plan benefits depend on social security benefits, whether can take lump sum If not participating, whether employer offers pension plans, whether respondent eligible and intends to participate in future and whether employer contributes Heath Status Self-reported health status now and compared with a year ago Self-reported emotional health status Difficulty with activities of daily living, including instrumental activities Self-reported medical conditions indicated by a doctor (high blood pressure, diabetes, cancer, chronic lung disease, strokes, emotional problems, arthritis, other problems, broken bones, pain, poor eyesight, hearing problems) Self-reports of smoking, drinking, exercise Cognition battery and mood assessment and clinical depression battery Self-reported work disabilities and employer accommodations Health Insurance Coverage Type of coverage: government, employer, individual, other If employer coverage, whether employee pays part or all of premium, whether available to retirees and whether employer pays part or all, whether retirees pay the same as other employees, whether spouses can be covered and whether retirees pay the same for spouse coverage as other employees If individual coverage, type and cost Whether ever turned down for coverage and why Health Care Use and Costs Stays in hospital or nursing home last 12 months Doctor visits last 12 months Home health care last 12 months Itemized medical care deductions Cost of individual insurance Total and out-of-pocket medical care expenditures, by category of service Assets and Debts Value of house, mobile home and site, farm, ranch Amount of mortgage, second mortgage, home equity loan Value of second home, time-share, amount of mortgage Net value of motor home or recreational vehicle Net value of other real estate, other vehicles, business Amount in Individual Retirement Accounts or Keogh accounts Net value of stocks, mutual funds Money in checking, saving, and money market accounts

128 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH BOX 5-1 (continued) Money in certificates of deposit, government savings bonds, Treasury bills Money in corporate bonds Net value of other savings/assets Amount of other debts Inheritances, when and from whom received, worth at the time Value of other transfers of $10,000 or more from relatives Life insurance settlements, when received, worth at the time, who was insured Large, unexpected expenses over last 20 years that made it difficult to meet financial goals Cash value of life insurance Capital gains component of asset value increases after first interview Expenditures (see above for medical care) Mortgage, rent, taxes, utilities, condominium fees Financial assistance of $500 or more in past 12 months to children or parents Food per week or month (including value of food stamps) in stores and delivered Meals eaten out (not counting at work or school) Itemized medical care deductions Charitable contributions (if $500 or more) Support to others outside household Total expenditures in third wave SOURCE: National Research Council (1997:Table 4-1, Boxes 4-4 to 4-7). At the same time, there has been an expansion in the volume and richness of administrative and other data sets collected by public and private agencies (see Sweeney, 2001). For example, birth certificates in many states now contain much more information than previously. The private sector has developed vast files of customer preferences and shopping habits. Although federal and state agencies have developed rules for data access and confidentiality protection for many public sector administrative data sets, some public and private sector data on individuals are accessible on the web. This development increases the opportunities for linkage with research data sets and increases the need to develop innovative confidentiality protection measures that mini- mize disclosure risk while not so restricting or altering the data as to undercut their research usefulness. SBES Research Environment Changes in the SBES research environment have increased the risks of disclosure and the need to pay heed to confidentiality pro- tection. As a result of technological developments in data process- ing, dissemination, and analysis, and the increased richness, variety, and volume of microdata sets, large numbers of SBES researchers engage in secondary analysis. These researchers include labor and

ENHANCING CONFIDENTIALITY PROTECTION 129 welfare economists, health services researchers, sociologists and so- cial psychologists, educational researchers, cultural anthropologists, and public opinion researchers. They obtain PUMS and summary files not only directly from source agencies, but also, increasingly, from data archives housed at universities that acquire files for redistribution from federal agencies, researchers, and others. Such archives include the Interuniversity Consortium for Political and Social Research (ICPSR) at the University of Michigan, which provided researchers at mem- ber universities access to public-use microdata as far back as the early 1960s (in the form of punchcards); the University of California (Berke- ley) Data Archive; the University of Minnesota Population Resource Center; the University of Wisconsin (Madison) Data and Program Li- brary; and many others. The growth in secondary analysis has whetted appetites for ever richer data sets, including linkages of survey microdata with such ad- ministrative data as social security earnings records, vital statistics records, Medicare and Medicaid records, employment and public as- sistance records from state and local agencies, and employer benefit records (see Hotz et al., 1998). These kinds of linkages can be difficult to achieve, given that custodial agencies (federal and state agencies, employers) generally have their own rules for access, which can dif- fer for the same type of data among agencies. Yet once achieved, such linkages raise disclosure risks if the researcher does not take adequate measures to protect confidentiality. The interests of researchers in access to rich data sets has often re- sulted in an adversarial stance with data providers, particularly statisti- cal agencies. Researchers are often impatient with agency restrictions on data access, yet they often do not fully understand the restrictions under which such agencies operate. Not only would statistical agen- cies be subject to stiff penalties if researchers were able to re-identify respondents in their data products, but they would also be concerned that confidentiality breaches would destroy trust with respondents and make it harder to obtain high enough response rates for quality re- sults.11 Finally, the policies of research funding agencies and leading aca- demic journals are pushing researchers to share their data with oth- ers. For example, the NIH Office of Extramural Research recently updated its policy guidance to expect researchers to share data from NIH-supported studies on a timely basis for use by other researchers. Most investigators submitting an NIH application will be required to 11 Surveys have shown that many people do not trust the confidentiality pledges from statistical agencies and that such mistrust can lead to reduced response (see, e.g., Singer, 2001).

130 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH include a plan for data sharing or to state why data sharing is not possi- ble (see http://grants.nih.gov/grants/policy/data sharing [4/10/03]). For many years, the National Science Foundation (NSF) economics pro- gram has required data underlying an article arising from an NSF grant to be placed in a public archive. Similar expectations exist at the National Institute of Justice and the Robert Wood Johnson Foun- dation. Moreover, many scientific journals require that authors make available the data included in their publications. The interest of funding agencies is primarily in leveraging their in- vestment in data collection by encouraging investigators to share data with others for secondary analysis. The interest of journals is primar- ily in being able to assure the quality and validity of the researcher’s findings by making it possible for others to replicate the results. (See National Research Council, 1986, for a discussion of the benefits of sharing research data.) Researchers who are conducting secondary analysis of publicly available summary or microdata would have no problem in satisfy- ing requirements for data sharing. However, the many researchers who collect their own data or who conduct analyses with data obtained from a variety of sources (e.g., linked survey and administrative data) could find it difficult to determine how to share data with others in a way that does not increase disclosure risks. Protection Methods of Statistical Agencies Because their mission is both to provide data for public use and to ensure that individual respondents are not re-identified, federal statisti- cal agencies are leaders in the development of techniques and policies for confidentiality protection at every stage of data development (see Doyle et al., 2001). Sometimes they have reacted to the heightened risks of disclosure from the technological and other developments just discussed by curtailing the availability of data. We briefly review some of the confidentiality protection practices for data dissemination of the Census Bureau and other federal statistical agencies to illustrate the range of techniques and policies used. The paper by George Duncan in Appendix E discusses disclosure risk analysis in detail and describes a range of methods for processing summary and microdata files to pro- tect confidentiality. Census Bureau The Census Bureau, because it collects and distributes such large volumes of data and because of the strict confidentiality protection pro-

ENHANCING CONFIDENTIALITY PROTECTION 131 visions of Title 13, has been proactive in developing techniques to min- imize the risks of re-identifying respondents from its data products, as well as to ensure confidentiality at the stages of data collection and pro- cessing. For example, an option for responding to the 2000 census for households that received the short form in the mail was to answer the questions over the Internet. Respondents had to enter the 23-digit con- trol number on the mail questionnaire to authenticate their response and preclude duplicate responses. Their data were entered through a firewall on the Bureau’s website, encrypted, sent to the Bureau’s main computer center, and put behind a second firewall to protect confiden- tiality. Every data product the Bureau makes available for public use must be reviewed by its Disclosure Review Board to ensure that disclosure risks have been minimized. For summary (tabular data), the Bureau’s measures to protect confidentiality include several steps. The level of detail of tabulations for geographic areas is related to the size of the area and the size of the survey or census sample.12 For census tabula- tions, the Bureau uses a “data swapping” technique (a type of data blur- ring), in which a small number of records for individual households that are similar on basic characteristics (number of adults and chil- dren and race and ethnic composition) are swapped between adjacent geographic areas so that the resulting tabulations for individual areas are close to but likely not exactly the same as the originally collected data.13 The Bureau also groups reported amounts for such continuous variables as income, rent, and housing value into broad categories, in- cluding a top category that is well below the largest individual amount reported. For microdata files of individual records that are made publicly available on the Internet or other media (PUMS files), the Bureau pro- tects confidentiality by taking such steps as stripping off all overt tags, such as name and address; limiting geographic identification to large areas (e.g., states, regions, or metropolitan areas above a certain pop- ulation size, depending on sample size); top-coding continuous vari- ables (e.g., providing income amounts in dollar increments up to a top category defined as any income that exceeds a specified amount); and assigning such variables as specific occupation, industry, or an- 12 For example, only basic census characteristics collected from everyone are tabu- lated for city blocks, while data from the census long-form sample (about 1 in 6 house- holds) are tabulated for larger geographic areas. Statistical reliability is another reason besides confidentiality protection to limit the geographic detail of sample tabulations. 13 Data swapping, which was first used in the 1990 census, replaced the previously used technique of cell suppression, in which cell values that were smaller than a specified threshold were blanked out. The suppression method made the data harder for users to work with (see Gates, 2000).

132 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH cestry to broad categories. Because of increasing concern about the ability to link census and survey microdata files with other data avail- able through the Internet, the Bureau scaled back the data content somewhat on the 2000 census PUMS files in comparison with the 1990 census files. Some microdata files of individual records are viewed as too sensi- tive and too easily re-identifiable to release in the form of a PUMS. For such data, the Census Bureau provides access to researchers who are sworn in as special census agents. For years such access could only be obtained by researchers who came to the Bureau’s headquarters at Suitland, Maryland, to perform their analyses. In the past decade, the Bureau has begun a program of establishing secure research data centers at major universities, at which researchers may use data files that are not otherwise available for outside use (see Dunne, 2001). At present, there are six such centers: the Bureau’s Boston Regional Office, Carnegie Mellon University, Duke University, the University of Michigan, and jointly managed sites at the Berkeley and Los Angeles campuses of the University of California. Other Statistical Agencies Other federal statistical agencies use similar methods to those of the Census Bureau to protect data during the stages of collection, pro- cessing, and storage and to minimize disclosure risks for data products that are made publicly available (see Federal Committee on Statistical Methodology, 1994). An additional source of concern for these agen- cies about disclosure risks during data collection and processing arises from the use of private contractors to conduct many of the household surveys they sponsor. (The Census Bureau uses its own staff for data collection for its surveys and those it conducts under contract to other agencies.) When contractors are used, agencies must carefully review the confidentiality protection procedures at contractors’ sites. For researcher access to sensitive data that are at risk for re-identifi- cation, some statistical agencies use licensing agreements. For exam- ple, NCES has statutory authority to sign licensing agreements that permit researchers to use microdata at their own institutions under specified restrictions (e.g., not sharing the data outside the research group, returning or destroying all copies of the microdata at the end of the project, etc.) The agreements must be signed by the researcher’s institution, and they contain penalties for noncompliance. Other agen- cies use licensing agreements as well. Sometimes agencies audit data users’ protection policies on a random or scheduled basis. (See

ENHANCING CONFIDENTIALITY PROTECTION 133 Seastrom, 2001, for a review of current licensing practices and require- ments by federal statistical and program agencies.) Finally, statistical agencies are investigating the use of new tech- niques for statistically perturbing sensitive microdata so that it may be possible to make them available in public-use form. Such methods in- clude data swapping with additive noise and creating a synthetic data set through statistical modeling. Determining the net utility of such data sets requires estimating an index of information loss and one of disclosure risk and judging when there is an acceptable balance be- tween the two (see discussion in Appendix E). THE ROLE OF RESEARCHERS, IRBS, OHRP, AND FUNDING AGENCIES IN PROTECTING CONFIDENTIALITY The Common Rule requires IRBs to determine that research pro- posals have adequate plans to protect the confidentiality of data ob- tained from respondents and to protect their privacy. Such protection is supported by the ethical principles in the Belmont Report. Yet we believe that IRBs, OHRP and researchers may not be giving as much , attention to issues of confidentiality protection as warranted by the in- creasing risks of disclosure from advances in technology and the vol- ume and richness of available data. We believe it is critical that federal funding agencies support continued research on methods for confiden- tiality protection. Recommendation 5.1: Because of increased risks of iden- tification of individual research participants with new meth- ods of data collection and dissemination, the human research participant protection system should continually seek to de- velop and implement state-of-the-art disclosure protection practices and methods. Toward this goal: • researchers should explicitly describe procedures to pro- tect the confidentiality of the data to be collected in pro- tocols they submit to IRBs; • IRBs should pay close attention to the adequacy of pro- posed procedures for protecting confidentiality; • federal funding agencies should support research on techniques to protect the confidentiality of SBES data that are made available for research use; and • the Office for Human Research Protections should reg- ularly promulgate good practices in analyzing disclo- sure risks and limiting those risks.

134 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Researchers have an obligation to provide sufficiently detailed in- formation in their proposal on plans for confidentiality protection so that an IRB can make an informed judgment about the adequacy of those plans. It is not enough to say that confidentiality will be pro- tected—the methods and procedures for doing so at each stage of the research project must be detailed. Similarly, IRBs have an obligation to carefully review proposed plans for confidentiality protection and to evaluate them against recognized good practices that are applicable for the type of research proposed. Federal funding agencies are increasingly interested in leveraging the dollars they invest in data collection under research grants and hence are requiring investigators to share data. Consequently, they have an interest in and, we believe, an obligation to support research on ways to analyze the risk of disclosure and on new methods for con- fidentiality protection that minimize disclosure risk and maximize the usefulness of shared data for secondary analysis. Such agencies could also partner with academic statisticians to disseminate information to researchers and IRBs about statistically based methods for disclosure risk analysis and risk minimization. OHRP has a leadership responsibility for guidance on issues of hu- man research participant protection. Because it is woefully inefficient for every IRB—many of which are overburdened—to take individual responsibility for staying abreast of threats to and state-of-the-art ways for protection of confidentiality, OHRP should regularly assemble and disseminate information on good practices for analyzing disclosure risk and minimizing that risk at every stage of a research project— from data collection to dissemination of results and sharing of data for secondary analysis. OHRP should also assemble and publish informa- tion on the confidentiality and data access guidelines of federal and state agencies with responsibility for administrative records that are of potential use for research. Such information would help researchers navigate the maze of varying agency policies and would also help IRBs evaluate research that proposes to use such data.14 In increasing their attention to confidentiality issues, we do not in- tend that IRBs (or OHRP) should add bureaucratic impediments to SBES research or waste scarce time and resources in activities that duplicate other efforts. We make four points in this regard and address 14 The National Human Research Protections Advisory Committee adopted a similar recommendation at its April 29-30, 2002, meeting. The recommendation also urged OHRP to identify federal statutes and regulations that provide confidentiality protection, identify issues or gaps, and develop proposals to address these gaps through “a consen- sus process involving the scientific and legal communities” (see http://www.ohrp.osophs. dhhs.gov/nhrpac/documents [4/10/03]).

ENHANCING CONFIDENTIALITY PROTECTION 135 a fifth point in the next section: (1) confidentiality protection should be appropriate to disclosure risk and the sensitivity of the data; (2) adequacy of confidentiality protection should be assessed for each stage of a project involving original data collection—from recruit- ment to dissemination and archiving; (3) IRBs should look to other bodies for guidance on good practices for confidentiality protection; (4) informed consent processes and documentation should address the extent and nature of confidentiality protection; and (5) IRBs should, as standard procedure, exempt from review studies that propose to use publicly available microdata files from sources that follow good protection practices and obtain informed con- sent from participants (see “A Confidentiality Protection System for Public-Use Microdata,” below). Confidentiality Protection Appropriate to Disclosure Risk We have been stressing the risks of disclosure; however, there are many projects for which confidentiality protection is unnecessary or irrelevant or the needed protections can be very limited. Observational studies of anonymous individuals in public settings (e.g., shoppers at a store who are not approached directly by the investigator and are not photographed or videotaped) need no confidentiality protection at all. Oral history studies in which public officials are interviewed about their public activities may require only limited protection, such as re- specting the right of the respondent to refuse to answer a particular question or putting an agreed-upon time restriction on the availabil- ity of the full oral history. Small laboratory experiments on stimulus- response behaviors may adequately protect confidentiality simply by not recording names or other identifiers of participants. Investigators in some participant observation studies may seek the consent of par- ticipants to include them individually in the published findings, likely with the use of pseudonyms, although participants must understand that pseudonyms will not necessarily protect them from being identi- fied. The larger point of all these examples is that for confidentiality pro- tection, as in many other aspects of human research participant protec- tion, there is no single approach that is appropriate for all studies. The risks of disclosure and the need for confidentiality protection should be

136 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH analyzed for each type of project and confidentiality protections made more or less stringent as appropriate. Guidance that OHRP develops on analyzing disclosure risk and im- plementing appropriate confidentiality protections should include ex- amples not only of studies that require stringent confidentiality protec- tion measures, but also of studies for which minimal or no confiden- tiality protection is needed. Protection for Every Stage of Research For projects that involve original data collection, IRBs will need to check that appropriate confidentiality protection procedures are pro- posed for each project stage, as applicable: • recruitment of participants—protection practices will vary de- pending on the method of recruitment (e.g., sending a letter that contains specific information about the prospective participant requires more attention to confidentiality protection than does a random-digit telephone dialing procedure); • training of research staff, including interviewers, computer pro- cessing staff, analysts, and archivists, in confidentiality protection practices; • collection of data from participants—protection practices will vary depending on whether collection is on paper, by CATI, by CAPI, on the web, or by other techniques, and who is being asked for information (e.g., some studies of families allow individual members to enter their own responses into a computer in such a way that neither other family members nor the interviewer are privy to the responses); • transfer of data to the research organization, whether by regular mail, e-mail, express mail, or other means; • data processing (including data entry and editing); • data linkage (including matching with administrative records or appending neighborhood characteristics); • data analysis; • publication of quantitative or qualitative results; • storage of data for further analysis by the investigator or for re- contacting participants to obtain additional data or both; and

ENHANCING CONFIDENTIALITY PROTECTION 137 • dissemination of quantitative and qualitative microdata for sec- ondary analysis by other researchers. For qualitative research, Johnson (1982) has developed advice on “ethical proofreading” of field reports prior to publication so that even if participants and their communities are identified, the harm to them is minimized. Her guidelines include such steps as reviewing language to make it descriptive rather than judgmental, providing context for unflattering descriptions, asking some of the participants to read the manuscript for accuracy and provide feedback, and asking colleagues to read the manuscript critically for ethical concerns. Use of Authoritative Guidance Until OHRP begins to promulgate good practices for confidential- ity protection for different stages and types of projects, IRBs should seek out sources of guidance from reputable sources rather than de- veloping standards for review of projects on their own. For example, many professional associations have developed and published good practices for confidentiality protection for studies in their discipline (see, e.g., Oral History Association Evaluation Guidelines; available at http://www.dickinson.edu/oha [4/10/03]). Major survey organiza- tions also have principles and practices for confidentiality protection (e.g., see Institute for Social Research, 1999). For protection strate- gies for data that are to be published or shared with other researchers, see the paper we commissioned by George Duncan in Appendix E. See also the following resources: Czajka and Kasprzyk, 2002; rel- evant chapters in Doyle et al. (2001); Statistical Working Paper 22 (Federal Committee on Statistical Methodology, 1994); guidance from the ICPSR, available at http://www.icpsr.umich.edu/ACCESS [4/10/03]; and links to information resources provided by the American Statisti- cal Association Committee on Privacy and Confidentiality, available at http://www.amstat.org/comm/cmtepc [4/10/03]. Confidentiality Protection and Informed Consent In reviewing research that involves original data collection, IRBs need to consider the adequacy of the information about confidentiality protection that is provided to participants through the informed con- sent process (see also Chapter 4). For example, participants should be informed that the data will be made available for research purposes in a form that protects against the risk of re-identification. If identifiers such as social security numbers are requested to permit linkages with administrative records, respondents should be informed about steps

138 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH that will be taken to prevent misuse of such identifiers and records and whether and when identifiers will be destroyed. The consent process should also make clear that confidentiality protection is never ironclad; rather, disclosure risks are minimized to the extent possible. For research on illegal behavior (e.g., drug abuse) or sensitive topics (e.g., alcoholism, sexual abuse, or domestic violence), it is vitally im- portant that adequate measures are in place to protect the privacy and confidentiality of research participants. Serious consequences may re- sult if there is an intentional or inadvertent breach of confidentiality (including social stigmatization, discrimination, loss of employment, emotional harm, civil or criminal liability, and, in some cases, physical injury). Investigators must ensure that the informed consent discus- sion delineates carefully the procedures for protecting confidentiality, which may include waiving written consent or obtaining a certificate of confidentiality to prevent data from being used in court. In addition, investigators must address the possibility that they may have to report such behaviors as child abuse to authorities. A CONFIDENTIALITY PROTECTION SYSTEM FOR PUBLIC-USE MICRODATA Recommendation 5.2: To facilitate secondary analysis of public-use microdata files, the Office for Human Research Protections, working with appropriate federal agencies and interagency groups, should establish a new confidentiality protection system for these data. The new system should build upon existing and new data archives and statistical agencies. Recommendation 5.3: Participating archives in the new public-use microdata protection system should certify to re- searchers whether data sets obtained from such an archive are sufficiently protected against disclosure to be acceptable for secondary analysis. IRBs should exempt such secondary analysis from review on the basis of the certification pro- vided. We argue that IRB review of secondary analysis with public-use mi- crodata is unnecessary and a misuse of scarce time and resources (see Chapter 6). If the data in a file have been processed to minimize the risk of re-identifying a respondent by using widely recognized good practices for confidentiality protection, then the research is eligible for exemption under the Common Rule (see Box 1-1 in Chapter 1). The

ENHANCING CONFIDENTIALITY PROTECTION 139 issue is how an IRB can be satisfied that a particular public-use mi- crodata file has been processed using good practices for confidentiality protection. To address this concern, we propose that OHRP work with statistical agencies, data archives, and appropriate interagency groups to develop a new system for confidentiality protection and certification for public-use microdata. Such a system would permit IRBs to exempt secondary analysis with such data from review as a matter of standard practice.15 We have described how federal statistical agencies are in the fore- front of efforts to protect the confidentiality of their data. When such agencies release a public-use microdata set (or summary file for small geographic areas), one can be assured that they have followed good practices for confidentiality protection. One can also be assured that they have addressed such aspects of human participant protection as informed consent and minimization of respondent burden because of the requirement that all data collections by federal agencies be cleared by OMB under the Paperwork Reduction Act (some agencies also have an IRB). We recommend that OHRP work with the Interagency Council on Statistical Policy, which includes 14 statistical agencies and is chaired by the chief statistician in OMB, to develop a certificate that accom- panies release of public-use data sets from these agencies on the web or in other media. Such a certificate would attest that the public-use file reflects good practice for confidentiality protection and that the data were collected with appropriate concern for informed consent and other protection issues. With such a certificate, the IRB would exempt from further review any analysis that proposes to use only the data from the certified file. Public-use microdata are also made available from federal program agencies and from private archives and research organizations. To ex- tend the certification system, OHRP should work with the Interagency Council on Federal Statistics, other public and private data produc- ers, and data archives to develop something like the assurance pro- gram that ORHP uses to authorize IRB operations at research insti- tutions under the Common Rule. Under such a confidentiality assur- ance program, an archive such as ICPSR would document the proce- dures it uses to protect confidentiality for data sets that researchers deposit with the archive for redistribution to secondary analysts. Once its procedures are approved, the archive would be able to certify that 15 Summary data that are provided for small geographic areas or population groups could also be certified; we recommend in Chapter 6 that analysis with such data not be brought to IRBs, as the aggregate data no longer represent human subjects under the regulations.

140 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH the data files it distributes are appropriately processed for confiden- tiality protection.16 Similarly, other data producers, such as federal program agencies and private research organizations, could obtain organization-wide certification for their public-use files, or an orga- nization could obtain certification on a case-by-case basis if it rarely develops public-use data. A program of assurance for confidentiality protection procedures and certification of data files for secondary analysis will necessitate that participants in the program—OHRP federal statistical agencies, , other data producers, and archives—keep abreast of disclosure risks and state-of-the art protection procedures. Continued vigilance, to- gether with sustained investment in disclosure risk analysis and con- fidentiality protection methods, will be necessary to assure IRBs, re- searchers, and participants that adequate protections are in place. CONCLUDING NOTE: MINIMAL DISCLOSURE RISK IS NOT ZERO RISK At present, there is considerable tension between the SBES research community and data producers, particularly federal statistical agen- cies, regarding what and how much microdata can be made available for public use. Statistical agencies, in some researchers’ views, are striving for zero disclosure risk, which is not possible, and are unnec- essarily restricting the availability of data that were collected with pub- lic funds and intended for public use. Researchers, in the view of many statistical agencies, are underestimating the disclosure risks and are not sufficiently cognizant of the legal constraints and penalties under which statistical agencies operate. We cannot resolve the tensions between these views. Several stud- ies of the Committee on National Statistics have addressed confiden- tiality issues (National Research Council, 1993, 2000a), and a study is currently under way to address specifically the balance of benefits and costs of data access versus disclosure risk. Some solutions will likely require congressional action, such as legislation that would en- able more statistical agencies to use licensing agreements that make the researcher, as well as the statistical agency, responsible for any breach of confidentiality. Other solutions will require accommodation of views. For example, researchers may have to be more accepting of conducting secondary analysis at secure data centers, while the spon- 16 ICPSR has informed researchers in member institutions on how to obtain infor- mation from its website on ICPSR confidentiality protection procedures to accompany protocol submissions to IRBs (Erik Austin, 2002, personal communication).

ENHANCING CONFIDENTIALITY PROTECTION 141 sors of data centers may need to interpret more broadly the kinds of studies that are acceptable to be conducted at a secure center. Cur- rently, for example, the Census Bureau approves studies that will help the Census Bureau (e.g., studies of missing data), which seems too nar- row a criterion given that the mission of the Bureau is to provide infor- mation for public use. Given the leverage that secondary data analysis provides for the ad- vancement of knowledge in the social, behavioral, and economic sci- ences, it is clearly important for researchers, data producers, and data archives to work cooperatively to maximize that leverage while appro- priately protecting the respondents who supplied the data. IRBs can contribute to such a cooperative effort by encouraging investigators who collect original data to deposit it with archives that will make the data available to others in a form that minimizes the risks of breach of confidentiality.

Next: 6 Enhancing the Effectiveness of Review: Minimal-Risk Research »
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!