Click for next page ( 114


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 113
- 5 Enhancing Confidentiality Protection BREACH OF CONFIDENTIALITY, that is, the release of data that permit identifying an individual participant, is often the major source of potential harm to participants in social, behavioral, and economic sciences (SBES) research (see Sieber, 20011. For exam- ple, a survey that poses no risk of physical injury and no more than minor psychological annoyance to a respondent may nonetheless ob- tain data that could harm the respondent if others outside the research team (e.g., neighbors, co-workers, public agency officials) could asso- ciate those data with the person. Such information, if known by others, might affect employment, insurability, personal relationships, civil or criminal liabilities, or other activities or situations. In some cases, the simple fact of learning that an individual is a study participant could be harmful (e.g., if police or drug dealers were to learn the names of participants in an ethnographic study of drug markets). Furthermore, if a participant has been assured of confidentiality, then disclosure of identifiable information about the person is a violation of the principle of respect for persons even if the information is not sensitive and would not result in any social, economic, legal, or other harm. Protection of confidentiality is a concern in SBES research when- ever data are collected in identifiable form. Identifiers include not only such overt information as name, address, social security number, telephone number, and e-mail address, but also detailed information about the respondent, such as income and profession, that could per- mit identification by inference in the absence of an explicit identifiers Some SBES research does not collect identifiable information in these termsfor example, observational studies of street-crossing behavior of people who are not photographed or approached by the investigator in any way. However, for much SBES research, confidentiality protec- tion is a necessary and vitally important component of the study plan. ~ Even the assignment of arbitrary identifiers may not protect against re-identification so long as the link between the arbitrary codes and originally collected real identifiers (e.g., name) has not been destroyed. 113

OCR for page 113
I 14 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Breach of confidentiality can occur at any stage of a research projectdata collection (including recruitment of participants), pro- cessing, storage, and dissemination for secondary use. At the present time, the risk of disclosures that could be embarrassing or damaging to participants (or that could simply violate a pledge of confidentiality) is increasing because of several factors. Most of these factors affect the disclosure risk for dissemination for secondary use, but some also have implications for the disclosure risk as a result of data collection, processing, and storage. They include the following: There are growing numbers and variety of publicly available mi- crodata files for secondary analysis. Such files provide informa- tion on individuals that have been stripped of obvious (and less obvious) identifiers. Increasingly, microdata sets contain richly detailed content from multiple observations on the same individ- uals over time (panel surveys), or they contain data on more than one type of entity (e.g., education surveys of students, their par- ents, teachers, and schools), or they contain both kinds of data. Such rich data sets increase the potential for re-identification of respondents through linkages with other data sources. Panel sur- veys also pose disclosure risks as a result of data storage because contact information must be retained for respondents for months or years. Generally, disclosure risks for panel surveys increase over time. There are growing numbers, variety, and content of administra- tive records data sets from public and private agencies (e.g., birth and death records) that are readily available on the Internet. Such files can potentially be linked to research data sets and used to re- identify research respondents. More broadly, the capabilities to link information across multiple sources on the Internet are increasing. There is increased emphasis by funding agencies on data shar- ing among researchers, not only to permit replication of results, but also to foster additional research at low marginal cost. Such sharing has many benefits, but it also multiplies the number of people with access to the data. . . . The speed of data processing and volume of low-cost data storage are increasing, which facilitates efforts to link data sets. There is increased use of data collection technologies, such as web surveys, and data transmission methods, such as e-mail and file-sharing procedures, that may not be secure.

OCR for page 113
ENHANCING CONFIDENTIALITY PROTECTION 115 In this chapter we provide historical background on confidentiality protection for research data in the United States, beginning with the attention given to protection issues by the U.S. Department of Health and Human Services (DHHS) and the institutional review board (IRB) system. We continue with the history of legislative protection for data collected by the Census Bureau and other federal statistical agencies that are widely used by SBES researchers and others. (for decennial census data, legislative protection goes back to the 1920s.) Until fairly recently, the activities of IRBs and statistical agencies with regard to confidentiality protection have proceeded largely independently. Next, we provide a fuller explication of the factors that are challeng- ing the adequacy of confidentiality protection measures today and the techniques and procedures that statistical agencies are adopting in re- sponse. Our recommendations to IRBs, the Office for Human Research Protections (OHRP), and research funding agencies for enhancing con- fidentiality protection for different kinds of SBES research follow. To protect participants and facilitate research with existing data, we pro- pose a new system for certifying the confidentiality of data files, built on existing and new data archives in the United States. HISTORY OF CONFIDENT IALI1Y PROTECTION IN THE PARTICIPANT PROTECTION SYSTEM Common Rule and IRB Operations Surprisingly, the history of human research participant protection policies and regulations shows relatively little attention to issues of con- fidentiality protection.2 Although the 1966 U.S. Public Health Service policy statement and the 1971 "Yellow Book" guidelines3 for partici- pant protection mentioned the need to protect confidentiality, the 1974 regulations (45 CFR 46) did not require IRBs to determine that study plans adequately address confidentiality issues. Indeed, papers and testimony from social scientists prepared for the 1974-1978 National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research commented that existing regulations did not ade- quately address issues of confidentiality in SBES research. A provision 2By "confidentiality," we mean protecting private information from being revealed to others in a way that could identify an individual research participant. Such protection is distinct from "privacy," by which we mean the right of an individual to decide whether to share information with the investigator in the first place (e.g., a survey participant could refuse to answer certain questions on grounds that they invaded his or her privacy; see National Research Council, 1993:22-23). 3The "Yellow Book" was the name given to "The Institutional Guide to DHEW Policy on Protection of Human Subjects" (see Chapter 3).

OCR for page 113
116 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH on confidentiality was added to the 1981 version of the regulations (45 CFR 46.11 la): it required IRBs to determine "where appropriate, there are adequate provisions to protect the privacy of subjects and to main- tain the confidentiality of data." The 1981 regulations also specified "a statement describing the extent, if any, to which confidentiality of records identifying the subject will be maintained" as one of the ba- sic elements of informed consent (45 CFR 46.116a). Beyond these two references, however, the Common Rule provides no guidance, even on traditional confidentiality protections for laboratory, survey, ethno- graphic, and other originally collected data, such as assigning new identifiers and destroying the link to the original identifiers, keeping data records in locked files, and the like.4 Guidance in the IRB Guide- book (Office for Protection from Research Risks, 1993:Ch.III.D) is very general. With regard to IRB attention to confidentiality protection in review- ing protocols, the 1975 Michigan survey found that confidentiality is- sues were relatively rare as a focus of IRB review: only 3 percent of protocols were required to modify their confidentiality procedures; in comparison, 24 percent of protocols were required to modify their con- sent forms or procedures (Gray, Cooke, and Tannenbaum, 1978:Table 21. This difference may be understandable given that the challenges to maintaining confidentiality were not as great then as they are to- day. However, the 1995 Bell survey 20 years later reported similar results: only 3 percent of IRB chairs said that inadequate confidential- ity protections were often a problem with the research protocols they reviewed, while problems with consent forms were cited frequently (Bell, Whiton, and Connelly, 1998:Figure 401. Similarly, only 14 per- cent of investigators reported being required to modify their proce- dures for protection of privacy and confidentiality, compared with 78 percent who reported being required to modify their consent forms (Bell, Whiton, and Connelly, 1998:Figure 411. The continued relative lack of emphasis on confidentiality protec- tion may result from determinations by IRBs that proposed protection procedures are adequate. It may also result from continued underesti- mation by IRBs of the risks of disclosure, which today's research and computing environment has heightened. 4See Sieber (2001) for a critique of the Common Rule's limited statements on confi- dentiality, which she asserts do not properly recognize the distinction between confiden- tiality and privacy.

OCR for page 113
ENHANCING CONFIDENTI~ITYPROTECTION 117 Confidentiality Certificates Another initiative by federal research funding agencies to protect confidentiality is the long-standing program of the National Institutes of Health (NIH) whereby researchers may obtain certificates of con- fidentiality for research on sensitive topics, whether the research is funded by NIH or another agency. The National Institute of Justice also makes such certificates available for criminal justice research. These certificates buttress confidentiality protection in specific circum- stances namely, they protect researchers from being compelled to de- liver names or identifying characteristics of participants in response to court orders or subpoenas, unless respondents have consented to such release.5 Qualifying studies include those that collect data on such top- ics as sexual attitudes, preferences or practices; use of alcohol, drugs, or other addictive products; mental health; genetic makeup; illegal conduct; or other topics for which the release of identifiable informa- tion might damage an individual's financial standing, employability, or reputation within the community or might lead to social stigmatization or discrimination. At present, however, the protection afforded by such certificates is prospective; that is, researchers cannot obtain protection for study results after data collection has been completed, and it is not always obvious in advance when a certificate may be needed. Medical Records Protection The Health Insurance Portability and Accountability Act (HIPAA) of 1996 contained a provision that has resulted in the latest initiative by federal research funding agencies to protect confidentiality. HIPAA promotes the use of standard formats for electronic information ex- change to simplifier the administration of health insurance payments for medical treatment. Recognizing a potential threat to the confiden- tiality of patient records, HIPAA required DHHS to submit to Congress detailed recommendations on privacy standards for individually identi- fiable health information. This short provision led to the Privacy Rule, which comprises hundreds of pages of regulations and commentary; it is scheduled to take full effect in April 2003 (see Gunn et al., 2002; Institute of Medicine, 2002:205-21 11. The version of the Privacy Rule issued by DHHS in December 2000 drew substantial criticism from the health care community, including researchers, who complained that the provisions for research access s The New York Court of Appeals upheld the authority of confidentiality certificates in 1973 (for more information, see http://grantsl.nih.gov/grants/policy/coc [4/10/03]).

OCR for page 113
118 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH to health information were confusing and unnecessarily restrictive. In response, DHHS published a modified Privacy Rule in August 2002. The Privacy Rule applies to health plans, health care clearinghouses, and health care providers (covered entities) who maintain patient and claims records; it also affects health care researchers who obtain such records from covered entities for analysis purposes. Under the rule, covered entities may make "de-identified" data available for research use, without patient authorization, in one of three ways. First, a cov- ered entity may release a "limited data set," consisting of patient and claims records stripped of a list of direct identifiers of the individual, relatives, household members, and employers, to researchers who sign a legally binding agreement to safeguard and not disclose the infor- mation. The identifiers that must be deleted include names, street addresses, telephone numbers, e-mail addresses, social security num- bers, medical record and health plan account numbers, device identi- fiers, license numbers, vehicle identifiers, full face photos, and finger and voice prints (Gunn et al., 2002:81. However, birth date, 9-digit zip code, and dates of admission and discharge are permissible to include in such a data set. Second, a covered entity may release a "de-identified" data set for research use without requiring the researcher to sign an agreement provided a more comprehensive list of identifiers has been removed. Third, a covered entity may employ a statistician to attest that the risk of re-identification is very small because of the nature of the data (e.g., in cases when the data have been subject to statistical manipulation- see "Protection Methods of Statistical Agencies," below). The Privacy Rule also provides that IRBs or Privacy Boards may issue waivers for research access to data when the research cannot be conducted with de-identified data and when it is not practicable to obtain authorization from research participants.6 The waiver re- quirements were initially criticized as being inconsistent with the Com- mon Rule; they were simplified and rewritten for consistency. They require that adequate plans are in place to protect identifiers from dis- closure and to destroy them at the earliest opportunity. IRBs and Pri- vacy Boards may also allow researchers limited access to identifiable records data in order to identify and recruit prospective participants or to conduct preliminary exploratory research to determine the feasi- bility of a full-fledged analysis. We cannot do justice to the Privacy Rule provisions in this brief summary nor anticipate how they may work in practice. We note 6It is apparently expected that IRBs would handle waivers under the Privacy Rule for federally funded research and that Privacy Boards would handle waivers for other research, although this is not clear (see Institute of Medicine, 2002:209).

OCR for page 113
ENHANCING CONFIDENTIALITY PROTECTION 119 that these provisions increase the necessity for IRBs, OEIRP, and re- searchers to become cognizant of good practices for confidentiality protection, as discussed below. CONFIDENTIALITY PROTECTION IN THE FEDERAL STATISTICAL SYSTEM Census Bureau History 1790 to World War II The history of confidentiality protection for federal statistical data begins, as for so many data collection and dissemination issues, with the decennial census first conducted in 1790 pursuant to the U.S. Constitution.7 In the first few decades of the census, the returns were posted in public places for public review. By the middle of the 19th century, the Congress and census directors began to worry about enu- merators improperly revealing information and possibly gaining some private benefit. Public posting was discontinued, and enumerators were instructed to keep census information confidential. Yet federal, state, and local agencies and courts not infrequently attempted to ob- tain individual census returns. Most often the Census Bureau rebuffed these requests, but sometimes it acceded to them. Finally, Public Law 13 (Title 13 of the U.S. Code) was enacted in 1929 to codify various practices that had been emerging in official U.S. statistics. Section 9 explicitly provided for the confidentiality of economic and population census data: The information furnished under the provisions of this Act shall be used solely for the statistical purposes for which it is supplied. No publication shall be made by the Census Office whereby the data furnished by any particular establishment or individual can be identified, nor shall the Director of the Census permit anyone other than the sworn employees of the Census Office to examine the individual reports. Another section provided heavy penalties, which currently include large fines and up to 5 years' imprisonment, for Census Bureau employees who breach confidentiality. Title 13 also covers household surveys con- ducted by the Bureau that use the census address list as their sampling frame. 7This history of confidentiality protection for the U.S. census draws heavily on Gates (2000) and Seltzer and Anderson (2002).

OCR for page 113
120 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH The enactment of Public Law 13 was timely because the Census Bureau was publishing more and more tabulations for smaller and smaller geographic areas, which required careful specification and re- view to minimize the risk of individual identification. As early as 1910, the Bureau published data for census tracts (locally delineated neigh- borhoods) in selected cities that paid for the tabulations. By 1940 the Census Bureau was coding and publishing census tract data for 64 cities. Also in 1940 the Bureau introduced a program of statistics for individual blocks in 191 cities. World War II and Later At the outbreak of World War II in 1939, the U.S. Attorney Gen- eral sought legislation to amend Title 13 to allow military and intelli- gence agencies to have access to individual census records. The Cen- sus Bureau adamantly opposed the legislation, and it was withdrawn. However, in June 1941 a newly appointed Census Bureau director, J.C. Capt, obtained the support of the Commerce Department for legisla- tion to authorize periodic surveys for national defense needs and to make census reports for individuals available for use in the "national defense program" with the approval of the president. This legislation passed the Senate in August 1941 with an accompanying report (77th Congress 1st session, Senate Rept 495, June 26, 1941, to accompany S 1627) that said: The needs of the defense program are of such a character as to require full and direct information about specific individ- uals and business establishments.... To continue to impose the rigid provisions of the present confidential use law of the Census Bureau... would defeat the primary objects of the legislation here proposed. The Senate legislation did not pass the House, but the Second War Powers Act, enacted March 27, 1942, effectively incorporated its provi- sions. This act provided that any Department of Commerce data could be provided to any federal agency at the written request of the agency head. It is not known whether individual census reports were ever pro- vided to people other than sworn Census Bureau employees. However, census tract-level tabulations of Japanese Americans from the 1940 census were provided to the Office of Naval Intelligence, and maps of city blocks with counts of Japanese Americans were provided to the Western Defense Command of the War Department, which facilitated internment of legal residents of Japanese origin.

OCR for page 113
ENHANCING CONFIDENTIALI7 Y PROTECTION 121 The relevant section of the Second War Powers Act was repealed as part of the First Decontrol Act of 1947. In 1947 the Census Bu- reau refused a request by the Attorney General for census information on individuals who were suspected of being communist sympathizers. Since that time, the Bureau has an unblemished record of protecting confidentiality for the data it collects from respondents to censuses and surveys, despite the increasing challenges it faces to such protections Its standing Disclosure Review Board reviews every data product the Bureau makes available for public use to ensure that disclosure risks . . . are mlnlmlzec .. Other Statistical Agencies All federal statistical agencies operate under strong norms to pro- tect the data they collect against disclosure that could identify an in- dividual.9 Some agencies have legal protection against requests from administrative agencies and other bodies to disclose individually iden- tified information. However, other agencies have had to rely on exec- utive orders, court cases, and long-established custom (see Norwood, 19951. For years, the Statistical Policy Division of the U.S. Office of Man- agement and Budget (OMB) endeavored to obtain legislation that would strengthen the statutory basis for protecting the confidentiality of all federal data collected for statistical purposes under a confidentiality pledge. These efforts achieved success when, in November 2002, Congress enacted the E-Government Act of 2002. Title V, the "Confi- dential Information Protection and Statistical Efficiency Act of 2002," subtitle A, places strict limits on the disclosure of individually identi- fied information collected under a pledge of confidentiality: such dis- closure can occur only with the informed consent of the respondent and the authorization of the agency head and only when the disclosure is not prohibited by any other law (e.g., Title 131. Subtitle A also pro- vides penalties for employees who unlawfully disclose information (up to 5 years in prison, up to $250,000 in fines, or both). However, even though confidentiality protection for statistical data is now on a much finer legal footing across the federal government, a loophole may exist for data from the National Center for Education See Gates (2000) for a summary of post-World War II changes in legislation and court decisions that have upheld the confidentiality protections of Title 13. A legal ex- ception to Title 13 is the provision in Title 44 that allows the National Archives to obtain individually identified census records and make them available for research use 72 years after the census date. 9See Principles and Practices for a Federal Statistical Agency (National Research Council, 2000b).

OCR for page 113
122 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Statistics (NCES). NCES has for many years had strong statutory pro- tection for maintaining the confidentiality of its data and stiff penalties for NCES staff who breach confidentiality. The USA Patriot Act of 2001 , enacted in October 2001 following the tragic terrorist acts of Septem- ber 11, may have vitiated the legal protections for NCES data. Section 508 of the act amended the National Center for Education Statistics Act of 1994 by allowing the Attorney General (or an assistant attorney general) to apply to a court to obtain any "reports, records, and infor- mation (including individually identifiable information) in the posses- sion" of NCES that are considered relevant to an authorized investiga- tion or prosecution of domestic or international terrorism. Section 508 also removed the penalties for NCES employees who furnish individual records under this section. To date, no requests for such records have been made, but NCES is revising the information it provides to survey respondents about the possibility that their data could be obtained un- der this act. It is not yet clear whether the confidentiality protections in the E-Government Act would take precedence over Section 508 of the Patriot Act. Federal Statistical Agencies and IRBs Most but not all cabinet departments that house federal statistical agencies have formally adopted the Common Rule (exceptions are the U.S. Departments of Labor and Treasury), and agency IRBs review proposed surveys for many statistical agencies. For example, the Na- tional Center for Health Statistics has an IRB, and the IRB for the Department of Education reviews NCES surveys. The Census Bureau, in contrast, does not obtain IRB review on the basis that its surveys are exempt under 45 CFR 46.101(b)~31(ii). That provision exempts re- search from IRB review when federal law, as in Title 13, requires "with- out exception that the confidentiality of the personally identifiable in- formation will be maintained throughout the research and thereafter." Yet there are features of some Census Bureau surveys that might be viewed as requiring IRB review (e.g., the appropriateness of providing financial incentives only to cases that otherwise refuse to participate in the Survey of Program Dynamics). All federal surveys are subject to clearance by OMB under the provi- sions of the Paperwork Reduction Act. This review covers not only sur- vey costs and burden for respondents, but also such issues as whether respondents are adequately informed about the purpose of the survey, the use of the information, whether response is voluntary or manda- tory, and the nature and extent of confidentiality protection. We are not in a position to recommend whether IRB review is needed in ad-

OCR for page 113
ENHANCING CONFIDENTIALITYPROTECTION 123 dition to OMB review, but we do suggest it might be useful for OMB and OHRP to discuss their respective jurisdictions. We note that statis- tical agencies have encountered some of the same problems as SEES researchers with IRB review, such as insistence on requiring signed written consent for minimal-risk surveys when evidence indicates that a signature requirement will deter response from some people who would otherwise be willing to participate (see Chapter 4~. PROTECTING CONFIDENTIALITY TODAY Increasing Challenges The development of new data collection and dissemination tech- nologies is arguably the principal factor increasing disclosure risks for research data that are made available by federal statistical agencies and other providers today. Other factors that play a role include in- creases in the volume and richness of the data collected (in turn made possible by technological advances) and changes in the nature of SEES research, which increasingly involves secondary analysis of data col- lected by others and sharing of data for validation purposes. New Technology Collection and processing technology for large-scale data collection efforts has been under almost continuous development since at least the end of the 1 9th century, when Herman Hollerith (then a Census Bu- reau employee, later, the founder of IBM) invented a punch-card tabu- lation machine to edit and tabulate the 1890 census (see Salvo, 20001. At that time and for many years thereafter, the limitations of printing technology constrained the amount of tabulations that the Census Bu- reau and other agencies could publish for research use, thereby mini- mizing disclosure risk. The challenges of protecting data confidentiality began increasing in the 1960s when the Census Bureau first took advantage of comput- erization to greatly expand the volume and kinds of data it made avail- able to the user community. From the 1960 census (the first to be pro- cessed wholly by computer), the Bureau provided summary files (SFs) for small geographic areas on a reimbursable basis to several business firms. The tabulations on these computer files were much more exten- sive than those in printed reports. In 1963 the Bureau, with support from the Population Research Council, developed the first public-use microdata sample or PUMS file, which contained 1960 census individ- ual records for 180,000 people (a 1-in-1,000 sample of the U.S. popu-

OCR for page 113
132 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH cestry to broad categories. Because of increasing concern about the ability to link census and survey microdata files with other data avail- able through the Internet, the Bureau scaled back the data content somewhat on the 2000 census PUMS files in comparison with the 1990 census files. Some microdata files of individual records are viewed as too sensi- tive and too easily re-identifiable to release in the form of a PUMS. For such data, the Census Bureau provides access to researchers who are sworn in as special census agents. For years such access could only be obtained by researchers who came to the Bureau's headquarters at Suitland, Maryland, to perform their analyses. In the past decade, the Bureau has begun a program of establishing secure research data centers at major universities, at which researchers may use data files that are not otherwise available for outside use (see Dunne, 20011. At present, there are six such centers: the Bureau's Boston Regional Office, Carnegie Mellon University, Duke University, the University of Michigan, and jointly managed sites at the Berkeley and Los Angeles campuses of the University of California. Other Statistical Agencies Other federal statistical agencies use similar methods to those of the Census Bureau to protect data during the stages of collection, pro- cessing, and storage and to minimize disclosure risks for data products that are made publicly available (see Federal Committee on Statistical Methodology, 1994~. An additional source of concern for these agen- cies about disclosure risks during data collection and processing arises from the use of private contractors to conduct many of the household surveys they sponsor. (The Census Bureau uses its own staff for data collection for its surveys and those it conducts under contract to other agencies.) When contractors are used, agencies must carefully review the confidentiality protection procedures at contractors' sites. For researcher access to sensitive data that are at risk for re-identifi- cation, some statistical agencies use licensing agreements. Thor exam- ple, NCES has statutory authority to sign licensing agreements that permit researchers to use microdata at their own institutions under specified restrictions (e.g., not sharing the data outside the research group, returning or destroying all copies of the microdata at the end of the project, etc.) The agreements must be signed by the researcher's institution, and they contain penalties for noncompliance. Other agen- cies use licensing agreements as well. Sometimes agencies audit data users' protection policies on a random or scheduled basis. (See

OCR for page 113
ENHANCING CONFIDENTIALITY PROTECTION 133 Seastrom, 2001, for a review of current licensing practices and require- ments by federal statistical and program agencies.) Finally, statistical agencies are investigating the use of new tech- niques for statistically perturbing sensitive microdata so that it may be possible to make them available in public-use form. Such methods in- clude data swapping with additive noise and creating a synthetic data set through statistical modeling. Determining the net utility of such data sets requires estimating an index of information loss and one of disclosure risk and judging when there is an acceptable balance be- tween the two (see discussion in Appendix E). THE ROLE OF RESEARCHERS, IRBS, OHRP, AND FUNDING AGENCIES IN PROTECTING CONFIDENTIALITY The Common Rule requires IRBs to determine that research pro- posals have adequate plans to protect the confidentiality of data ob- tained from respondents and to protect their privacy. Such protection is supported by the ethical principles in the Belmont Report. Yet we believe that IRBs, OHRP, and researchers may not be giving as much attention to issues of confidentiality protection as warranted by the in- creasing risks of disclosure from advances in technology and the vol- ume and richness of available data. We believe it is critical that federal funding agencies support continued research on methods for confiden- tiality protection. Recommendation 5.1: Because of increased risks of iden- tification of individual research participants with new meth- ods of data collection and dissemination, the human research participant protection system should continually seek to de- velop and implement state-of-the-art disclosure protection practices and methods. Toward this goal: . researchers should explicitly describe procedures to pro- tect the confidentiality of the data to be collected in pro- tocols they submit to IRBs; IRBs should pay close attention to the adequacy of pro- posed procedures for protecting confidentiality; federal funding agencies should support research on techniques to protect the confidentiality of SBES data that are made available for research use; and the Office for Human Research Protections should reg- ularly promulgate good practices in analyzing disclo- sure risks and limiting those risks.

OCR for page 113
134 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Researchers have an obligation to provide sufficiently detailed in- formation in their proposal on plans for confidentiality protection so that an IRB can make an informed judgment about the adequacy of those plans. It is not enough to say that confidentiality will be pro- tectedthe methods and procedures for doing so at each stage of the research project must be detailed. Similarly, IRBs have an obligation to carefully review proposed plans for confidentiality protection and to evaluate them against recognized good practices that are applicable for the type of research proposed. Federal funding agencies are increasingly interested in leveraging the dollars they invest in data collection under research grants and hence are requiring investigators to share data. Consequently, they have an interest in and, we believe, an obligation to support research on ways to analyze the risk of disclosure and on new methods for con- fidentiality protection that minimize disclosure risk and maximize the usefulness of shared data for secondary analysis. Such agencies could also partner with academic statisticians to disseminate information to researchers and IRBs about statistically based methods for disclosure risk analysis and risk minimization. OHRP has a leadership responsibility for guidance on issues of hu- man research participant protection. Because it is woefully inefficient for every IRB many of which are overburdened- to take individual responsibility for staying abreast of threats to and state-of-the-art ways for protection of confidentiality, OHRP should regularly assemble and disseminate information on good practices for analyzing disclosure risk and minimizing that risk at every stage of a research project- from data collection to dissemination of results and sharing of data for secondary analysis. OHRP should also assemble and publish informa- tion on the confidentiality and data access guidelines of federal and state agencies with responsibility for administrative records that are of potential use for research. Such information would help researchers navigate the maze of varying agency policies and would also help IRBs evaluate research that proposes to use such data.~4 In increasing their attention to confidentiality issues, we do not in- tend that IRBs (or OHRP) should add bureaucratic impediments to SEES research or waste scarce time and resources in activities that duplicate other efforts. We make four points in this regard and address i4The National Human Research Protections Advisory Committee adopted a similar recommendation at its April 29-30, 2002, meeting. The recommendation also urged OHRP to identify federal statutes and regulations that provide confidentiality protection, identify issues or gaps, and develop proposals to address these gaps through "a consen- sus process involving the scientific and legal communities" (see http://www.ohrp.osophs. dhhs.gov/nhrpac/documents [4/10/03]).

OCR for page 113
ENHANCING CONFIDENTIALI7 Y PROTECTION a fifth point in the next section: 135 (1) confidentiality protection should be appropriate to disclosure risk and the sensitivity of the data; (2) adequacy of confidentiality protection should be assessed for each stage of a project involving original data collection from recruit- ment to dissemination and archiving; (3) IRBs should look to other bodies for guidance on good practices for confidentiality protection; (4) informed consent processes and documentation should address the extent and nature of confidentiality protection; and (5) IRBs should, as standard procedure, exempt from review studies that propose to use publicly available microdata files from sources that follow good protection practices and obtain informed con- sent from participants (see "A Confidentiality Protection System for Public-Use Microdata," below). Confidentiality Protection Appropriate to Disclosure Risk We have been stressing the risks of disclosure; however, there are many projects for which confidentiality protection is unnecessary or irrelevant or the needed protections can be very limited. Observational studies of anonymous individuals in public settings (e.g., shoppers at a store who are not approached directly by the investigator and are not photographed or videotaped) need no confidentiality protection at all. Oral history studies in which public officials are interviewed about their public activities may require only limited protection, such as re- specting the right of the respondent to refuse to answer a particular question or putting an agreed-upon time restriction on the availabil- ity of the full oral history. Small laboratory experiments on stimulus- response behaviors may adequately protect confidentiality simply by not recording names or other identifiers of participants. Investigators in some participant observation studies may seek the consent of par- ticipants to include them individually in the published findings, likely with the use of pseudonyms, although participants must understand that pseudonyms will not necessarily protect them from being identi- fied. The larger point of all these examples is that for confidentiality pro- tection, as in many other aspects of human research participant protec- tion, there is no single approach that is appropriate for all studies. The risks of disclosure and the need for confidentiality protection should be

OCR for page 113
136 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH analyzed for each type of project and confidentiality protections made more or less stringent as appropriate. Guidance that OHRP develops on analyzing disclosure risk and im- plementing appropriate confidentiality protections should include ex- amples not only of studies that require stringent confidentiality protec- tion measures, but also of studies for which minimal or no confiden- tiality protection is needed. Protection for Every Stage of Research For projects that involve original data collection, IRBs will need to check that appropriate confidentiality protection procedures are pro- posed for each project stage, as applicable: recruitment of participants- protection practices will vary de- pending on the method of recruitment (e.g., sending a letter that contains specific information about the prospective participant requires more attention to confidentiality protection than does a random-digit telephone dialing procedure); . training of research staff, including interviewers, computer pro- cessing staff, analysts, and archivists, in confidentiality protection practices; collection of data from participants- protection practices will vary depending on whether collection is on paper, by CATI, by CAPI, on the web, or by other techniques, and who is being asked for information (e.g., some studies of families allow individual members to enter their own responses into a computer in such a way that neither other family members nor the interviewer are privy to the responses); . transfer of data to the research organization, whether by regular mail, e-mail, express mail, or other means; data processing (including data entry and editing); data linkage (including matching with administrative records or appending neighborhood characteristics); data analysis; publication of quantitative or qualitative results; storage of data for further analysis by the investigator or for re- contacting participants to obtain additional data or both; and

OCR for page 113
ENHANCING CONFIDENTIALITYPROTECTION 137 dissemination of quantitative and qualitative microdata for sec- ondary analysis by other researchers. For qualitative research, Johnson (1982) has developed advice on "ethical proofreading" of field reports prior to publication so that even if participants and their communities are identified, the harm to them is minimized. Her guidelines include such steps as reviewing language to make it descriptive rather than judgmental, providing context for unflattering descriptions, asking some of the participants to read the manuscript for accuracy and provide feedback, and asking colleagues to read the manuscript critically for ethical concerns. Use of Authoritative Guidance Until OHRP begins to promulgate good practices for confidential- ity protection for different stages and types of projects, IRBs should seek out sources of guidance from reputable sources rather than de- veloping standards for review of projects on their own. For example, many professional associations have developed and published good practices for confidentiality protection for studies in their discipline (see, e.g., Oral History Association Evaluation Guidelines; available at http://www.dickinson.edu/oha [4/10/0311. Major survey organiza- tions also have principles and practices for confidentiality protection (e.g., see Institute for Social Research, 19991. For protection strate- gies for data that are to be published or shared with other researchers, see the paper we commissioned by George Duncan in Appendix E. See also the following resources: Czajka and Kasprzyk, 2002; rel- evant chapters in Doyle et al. (20011; Statistical Working Paper 22 (Federal Committee on Statistical Methodology, 19941; guidance from the ICPSR, available at http://www.icpsr.umich.edulACCESS f4/10/031; and links to information resources provided by the American Statisti- cal Association Committee on Privacy and Confidentiality, available at http://www.amstat.org/commlcmtepc t4/10/031. Confidentiality Protection and Informed Consent In reviewing research that involves original data collection, IRBs need to consider the adequacy of the information about confidentiality protection that is provided to participants through the informed con- sent process (see also Chapter 41. For example, participants should be informed that the data will be made available for research purposes in a form that protects against the risk of re-identification. If identifiers such as social security numbers are requested to permit linkages with administrative records, respondents should be informed about steps

OCR for page 113
138 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AlID BEHAVIORAL SCIENCES RESEARCH that will be taken to prevent misuse of such identifiers and records and whether and when identifiers will be destroyed. The consent process should also make clear that confidentiality protection is never ironclad; rather, disclosure risks are minimized to the extent possible. For research on illegal behavior (e.g., drug abuse) or sensitive topics (e.g., alcoholism, sexual abuse, or domestic violence), it is vitally im- portant that adequate measures are in place to protect the privacy and confidentiality of research participants. Serious consequences may re- sult if there is an intentional or inadvertent breach of confidentiality (including social stigmatization, discrimination, loss of employment, emotional harm, civil or criminal liability, and, in some cases, physical injury). Investigators must ensure that the informed consent discus- sion delineates carefully the procedures for protecting confidentiality, which may include waiving written consent or obtaining a certificate of confidentiality to prevent data from being used in court. In addition, investigators must address the possibility that they may have to report such behaviors as child abuse to authorities. A CONFIDENTIALITY PROTECTION SYSTEM FOR PUBLIC-USE MICRODATA Recommendation 5.2: To facilitate secondary analysis of public-use microdata files, the Office for Human Research Protections, working with appropriate federal agencies and interagency groups, should establish a new confidentiality protection system for these data. The new system should build upon existing and new data archives and statistical agencies. Recommendation 5.3: Participating archives in the new public-use microdata protection system should certify to re- searchers whether data sets obtained from such an archive are sufficiently protected against disclosure to be acceptable for secondary analysis. IRBs should exempt such secondary analysis from review on the basis of the certification pro- vided. We argue that IRB review of secondary analysis with public-use m crodata is unnecessary and a misuse of scarce time and resources (see Chapter 61. If the data in a file have been processed to minimize the risk of re-identifying a respondent by using widely recognized good practices for confidentiality protection, then the research is eligible for exemption under the Common Rule (see Box 1-1 in Chapter 11. The

OCR for page 113
ENHANCING CONFIDENTIALITY PROTECTION 139 issue is how an IRB can be satisfied that a particular public-use mi- crodata file has been processed using good practices for confidentiality protection. To address this concern, we propose that OHRP work with statistical agencies, data archives, and appropriate interagency groups to develop a new system for confidentiality protection and certification for public-use microdata. Such a system would permit IRBs to exempt secondary analysis with such data from review as a matter of standard practice. t5 We have described how federal statistical agencies are in the fore- front of efforts to protect the confidentiality of their data. When such agencies release a public-use microdata set (or summary file for small geographic areas), one can be assured that they have followed good practices for confidentiality protection. One can also be assured that they have addressed such aspects of human participant protection as informed consent and minimization of respondent burden because of the requirement that all data collections by federal agencies be cleared by OMB under the Paperwork Reduction Act (some agencies also have an IRB). We recommend that OHRP work with the Interagency Council on Statistical Policy, which includes 14 statistical agencies and is chaired by the chief statistician in OMB, to develop a certificate that accom- panies release of public-use data sets from these agencies on the web or in other media. Such a certificate would attest that the public-use file reflects good practice for confidentiality protection and that the data were collected with appropriate concern for informed consent and other protection issues. With such a certificate, the IRB would exempt from further review any analysis that proposes to use only the data from the certified file. Public-use microdata are also made available from federal program agencies and from private archives and research organizations. To ex- tend the certification system, OHRP should work with the Interagency Council on Federal Statistics, other public and private data produc- ers, and data archives to develop something like the assurance pro- gram that ORHP uses to authorize IRB operations at research insti- tutions under the Common Rule. Under such a confidentiality assur- ance program, an archive such as ICPSR would document the proce- dures it uses to protect confidentiality for data sets that researchers deposit with the archive for redistribution to secondary analysts. Once its procedures are approved, the archive would be able to certify that ~5 Summary data that are provided for small geographic areas or population groups could also be certified; we recommend in Chapter 6 that analysis with such data not be brought to IRBs, as the aggregate data no longer represent human subjects under the regulations.

OCR for page 113
140 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH the data files it distributes are appropriately processed for confiden- tiality protection. Similarly, other data producers, such as federal program agencies and private research organizations, could obtain organization-wide certification for their public-use files, or an orga- nization could obtain certification on a case-by-case basis if it rarely develops public-use data. A program of assurance for confidentiality protection procedures and certification of data files for secondary analysis will necessitate that participants in the program OHRP, federal statistical agencies, other data producers, and archives keep abreast of disclosure risks and state-of-the art protection procedures. Continued vigilance, to- gether with sustained investment in disclosure risk analysis and con- fidentiality protection methods, will be necessary to assure IRBs, re- searchers, and participants that adequate protections are in place. CONCLUDING NOTE: MINIMAL DISCLOSURE RISK IS NOT ZERO RISK At present, there is considerable tension between the SEES research community and data producers, particularly federal statistical agen- cies, regarding what and how much microdata can be made available for public use. Statistical agencies, in some researchers' views, are striving for zero disclosure risk, which is not possible, and are unnec- essarily restricting the availability of data that were collected with pub- lic funds and intended for public use. Researchers, in the view of many statistical agencies, are underestimating the disclosure risks and are not sufficiently cognizant of the legal constraints and penalties under which statistical agencies operate. We cannot resolve the tensions between these views. Several stud- ies of the Committee on National Statistics have addressed confiden- tiality issues (National Research Council, 1993, 2000a), and a study is currently under way to address specifically the balance of benefits and costs of data access versus disclosure risk. Some solutions will likely require congressional action, such as legislation that would en- able more statistical agencies to use licensing agreements that make the researcher, as well as the statistical agency, responsible for any breach of confidentiality. Other solutions will require accommodation of views. For example, researchers may have to be more accepting of conducting secondary analysis at secure data centers, while the spon- ~6ICPSR has informed researchers in member institutions on how to obtain infor- mation from its website on ICPSR confidentiality protection procedures to accompany protocol submissions to IRBs (Erik Austin, 2002, personal communication).

OCR for page 113
ENHANCING CONFIDENTIALITY PROTECTION 141 sors of data centers may need to interpret more broadly the kinds of studies that are acceptable to be conducted at a secure center. Cur- rently, for example, the Census Bureau approves studies that will help the Census Bureau (e.g., studies of missing data), which seems too nar- row a criterion given that the mission of the Bureau is to provide infor- mation for public use. Given the leverage that secondary data analysis provides for the ad- vancement of knowledge in the social, behavioral, and economic sci- ences, it is clearly important for researchers, data producers, and data archives to work cooperatively to maximize that leverage while appro- priately protecting the respondents who supplied the data. IRBs can contribute to such a cooperative effort by encouraging investigators who collect original data to deposit it with archives that will make the data available to others in a form that minimizes the risks of breach of confidentiality.

OCR for page 113
142 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH in; closure risks have been minimized. For summary (tabular data), the Bureau's measures to protect conficlentiality include several steps. The level of detail of tabulations for geographic areas is made in- verse to the size of the area and the size of the survey or census sample. For census tabulations, the Bureau uses a "data swap- ping" technique (a type of ciata blurring), in which a small number of records for individual households that are similar on basic char- acteristics (number of adults and children and race ancl ethnic com- position) are swapped between adjacent geographic areas so that the resulting tabulations for individual areas are close to but likely not exactly the same as the originally collected ciata.~3 The Bureau also groups reporter! amounts for such continuous variables as in- come, rent, ant! housing value into broac! categories, inclucling a top category that is well below the largest indiviclual amount reported. For microciata files of individual records that are macle publicly available on the Internet or other media (PUMS files), the Bureau protects confidentiality by taking such steps as stripping off all overt tags, such as name and address; limiting geographic identification to large areas (e.g., states, regions, or metropolitan areas above a certain population size, depending on sample size); top-coding con- tinuous variables (e.g., providing income amounts in clolIar incre- ments up to a top category definer! as any income that exceeds a specified amount); and assigning such variables as specific occu- pation, industry, or ancestry to broad categories. Because of in- creasing concern about the ability to link census and survey micro- clata files with other data available through the Internet, the Bureau scaled back the data content somewhat on the 2000 census PUMS files in comparison with the 1990 census files. Some microcIata files of inclividual records are viewed as too sensitive and too easily re-identifiable to release in the form of a PUMS. For such data, the Census Bureau provides access to re- searchers who are sworn in as special census agents. For years such access could only be obtained by researchers who came to the Thor example, only basic census characteristics collected from everyone are tabulated for city blocks, while data from the census long-form sample (about 1- in-6 households) are tabulated for larger geographic areas. Statistical reliability is another reason besides confidentiality protection to limit the geographic detail of sample tabulations. Okidata swapping, which was first used in the 1990 census, replaced the previ- ously used technique of cell suppression, in which cell values that were smaller than a specified threshold were blanked out. The suppression method made the data harder for users to work with (see Gates, 20001. 6