National Academies Press: OpenBook

Protecting Participants and Facilitating Social and Behavioral Sciences Research (2003)

Chapter: Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards

« Previous: Appendix D: Selected Studies of IRB Operations: Summary Descriptions
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 235
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 236
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 237
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 238
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 239
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 240
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 241
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 242
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 243
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 244
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 245
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 246
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 247
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 248
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 249
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 250
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 251
Suggested Citation:"Appendix E: Confidentiality and Data Access Issues for Institutional Review Boards." National Research Council. 2003. Protecting Participants and Facilitating Social and Behavioral Sciences Research. Washington, DC: The National Academies Press. doi: 10.17226/10638.
×
Page 252

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

—E— Confidentiality and Data Access Issues for Institutional Review Boards George T. Duncan Carnegie Mellon University INTRODUCTION A CCEPTED PRINCIPLES of information ethics (see National Re- search Council, 1993) require that promises of confidentiality be preserved and that the data collected in surveys and stud- ies adequately serve their purposes. A compromise of the confiden- tiality pledge could harm the research organization, the subject, or the funding organization. A statistical disclosure occurs when the data dis- semination allows data snoopers to gain information about subjects by which the snooper can isolate individual respondents and correspond- ing sensitive attribute values (Duncan and Lambert, 1989; Lambert, 1993). Policies and procedures are needed to reconcile the need for confidentiality and the demand for data (Dalenius, 1988). Under a body of regulation known as the Federal Policy for the Pro- tection of Human Subjects, the National Institutes of Health Office of Human Subjects Research (OHSR) mandates that institutional review boards (IRBs) determine that research protocols assure the privacy and confidentiality of subjects. Specifically, it requires IRBs to ascer- tain whether (a) personally identifiable research data will be protected to the extent possible from access or use and (b) any special privacy and confidentiality issues are properly addressed, e.g., use of genetic infor- mation. This standard directs an IRB’s attention, but without elabora- tion and clarification it does not provide IRBs with operational crite- ria for evaluation of research protocols. Nor does it provide guidance to researchers in how to establish research protocols that can merit IRB approval. The Office for Human Research Protection (OHRP) is responsible for interpreting and overseeing implementation of the reg- 235

236 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH ulations regarding the Protection of Human Subjects (45 CFR 46) pro- mulgated by the Department of Health and Human Services (DHHS). OHRP is responsible for providing guidance to researchers and IRBs on ethical issues in biomedical and behavioral research. As IRBs respond to their directive to ethically oversee the burgeon- ing research on human subjects, they require systematic ways of ex- amining protocols for compliance with best practice for confidentiality and data access. Clearly, the task of an IRB is lightened if researchers are fully aware of such practices and how they can be implemented. This paper identifies key confidentiality and data access issues that IRB members must consider when reviewing protocols. It provides both a conceptual framework for such reviews and a discussion of a variety of administrative procedures and technical methods that can be used by researchers to simultaneously assure confidentiality protection and appropriate access to data. CRITICAL ISSUES Reason for Concern Most generally, an ethical perspective requires researchers to maxi- mize the benefits of their research while minimizing the risk and harm to their subjects. This beneficence notion is often interpreted that, first, “one ought not to inflict harm” and, second, that “one ought to do or promote good.” In the context of assuring data quality from research studies, this means first assuring an adequate degree of confidentiality protection and then maximizing the value of the data generated by the research. Confidentiality is afforded for reasons of ethical treatment of research subjects, pragmatic grounds of assuring subject cooperation, and, in some cases, legal requirements. Aspects of Concern Data have a serious risk of disclosure when (a) disclosure would have negative consequences, (b) a data snooper is motivated—both psy- chologically and pragmatically—to seek disclosure (Elliot, 2001), and (c) the data are vulnerable to disclosure attack. Based on its confiden- tiality pledges, researchers must protect certain sensitive objects from a data snooper. Sensitive objects can be any of a variety of variables associated with a subject entity (person, household, enterprise, etc.). Examples include the values of numerical variables, such as household income, an X-ray of a patient’s lung, and a subject’s report of their sex-

CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 237 ual history. Data with particular characteristics pose substantial risk of disclosure and suggest vulnerability: • geographical detail—census block (Elliot, Skinner, and Dale, 1998; Greenberg and Zayatz, 1992); • longitudinal or panel structure—criminal histories (Abowd and Woodcock, 2001); • outliers, likely unique in the population—such as a 16-year-old widow (Dalenius, 1986; Greenberg, 1990); • attributes with high level of detail—income to the nearest dollar (Elliot, 2001); • many attribute variables—such as medical record (Sweeney, 2001); • population data, as in a census, rather than a survey with small sampling fraction (Elliot, 2001); • databases that are publicly available, identified, and share indi- vidual respondents and attribute variables (key variables—Elliot and Dale, 1999) with the subject data—marketing and credit data- bases. Data with geographical detail, such as census tract data, may be easily linked to known characteristics of respondents. Concern for this suggests placing minimum population levels for geographical identi- fiers. For particular geographical regions, this can mean specifying the minimum size of a region that can be reported. Longitudinal data, which tracks entities over time, also poses substantial disclosure risk. Many individuals had coronary bypass surgery in the Chicago area in 1998 and many had bypass surgery in Phoenix in 1999, but few did both. Outliers, say on variables like weight, height, or cholesterol level can lead to identifiable respondents. Data with many attribute vari- ables allow easier linkage with known attributes of identified entities, and entities, which are unique in the sample, are more likely to be unique in the population. Population data pose more disclosure risk than data from a survey having a small sampling fraction. Finally, spe- cial concern must be shown when other databases are available to the data snooper and these databases are both identified and share with the subject data both individual respondents and certain attribute vari- ables. Record linkage may then be possible between the subject data and the external database. The shared attribute variables provide the key.

238 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Disclosure The legitimate objects of inquiry for research involving human sub- jects are statistical aggregates over the records of individuals, for ex- ample, the median number of serious infections sustained by patients receiving a drug for treatment of arthritis. The investigators seek to provide the research community with data that will allow accurate in- ference about such population characteristics. At the same time, to respect confidentiality, the investigators must thwart the data snooper who might seek to use the disseminated data to draw accurate infer- ences about, say, the infection history of a particular patient. Such a capability by a data snooper would constitute a statistical disclosure. There are two major types of disclosure—identity disclosure and attribute disclosure. Identity disclosure occurs with the association of a respondent’s identity and a disseminated data record (Paass, 1988; Spruill, 1983; Strudler et al., 1986). Attribute disclosure occurs with the association of either an attribute value in the disseminated data or an estimated attribute value based on the disseminated data with the respondent (Duncan and Lambert, 1989; Lambert, 1993). In the case of identity disclosure, the association is assumed exact. In the case of attribute disclosure, the association can be approximate. Many investigators emphasize limiting the risk of identity disclosure, perhaps because of its substantial equivalence to the inadvertent release of an identified record. An attribute disclosure, even though it invades the privacy of a respondent, may not be so easily traceable to the actions of an agency. An IRB in its oversight capacity should be concerned that investigators limit the risk of both attribute and identity disclosures. Risk of Disclosure Measures of disclosure risk are required (Elliot, 2001). In the con- text of identity disclosure, disclosure risk can arise because a data snooper may be able to use the disseminated data product to reiden- tify some deidentified records. Spruill (1983) proposed a measure of disclosure risk for microdata: (1) for each “test” record in the masked file, compute the Euclidean distance between the test record and each record in the source file; (2) determine the percentage of test records that are closer to their parent source record than to any other source record. She defines the risk of disclosure to be the percentage of test records that match the correct parent record multiplied by the sam- pling fraction (fraction of source records released). More generally, and consistent with Duncan and Lambert (1986, 1989), an agency will have succeeded in protecting the confidential-

CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 239 ity of a released data product if the data snooper remains sufficiently uncertain about a protected target value after data release. From this perspective, a measure of disclosure risk is built on measures of uncer- tainty. Furthermore, an agency can model the decision making of the data snooper as a basis for using disclosure limitation to deter infer- ences about a target. Data snoopers are deterred from publicly making inferences about a target when their uncertainty is sufficiently high. Mathematically, uncertainty functions provide a workable framework for this analysis. Examples include Shannon entropy, which has found use in categorizing continuous microdata and coarsening of categor- ical data (Domingo-Ferrer and Torra, 2001; Willenborg and de Waal, 1996:138). Generally, a data snooper has a priori knowledge about a target, of- ten in the form of a database with identified records (Adam and Wort- mann, 1989). Certain variables may be in common with the subject database. These variables are called key or identifying (De Waal and Willenborg, 1996; Elliot, 2001). When a single record matches on the key variables, the data snooper has a candidate record for identifica- tion. That candidacy is promoted to an actual identification if the data snooper is convinced that the individual is in the target database. This would be the case either if the data snooper has auxiliary information to that effect or if the data snooper is convinced that the individual is unique in the population. The data snooper may find from certain key variables that a sample record is unique. The question then is whether the individual is also unique on these key variables in the population. Bethlehem, Keller, and Pannekoek (1990) have examined detection of records agreeing on simple combinations of keys based on discrete variables in the files. Record linkage methodologies have been exam- ined by Domingo-Ferrer and Torra (2001), Fuller (1993), and Winkler (1998). Deidentification Deidentification of data is the process of removing apparent iden- tifiers (name, e-mail address, social security number, phone number, address, etc.) from a data record. Deidentification does not necessar- ily make a record anonymous, as it may well be possible to reidentify the record using external information. In a letter to DHHS, the Amer- ican Medical Informatics Association (2000) noted: However, in discussions with a broad range of healthcare stakeholders, we have found the concept of “deidentified information” can be misleading, for it implies that if the

240 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH 19 data elements are removed, the problem of reidentifica- tion has been solved. The information security literature suggests otherwise. Additionally, with the continuing and dramatic increase in computer power that is ubiquitously available, personal health data items that currently would be considered ‘anonymous’ may lend themselves to increas- ingly easy reidentification in the future. For these reasons, we believe the regulations would be better served by adopt- ing the conventions of personal health data as being of “High Reidentification Potential” (e.g., the 19 data elements listed in the current draft), and “Low Reidentification Potential.” Over time, some elements currently considered of low po- tential may migrate to the high potential classification. More importantly, this terminology conveys the reality that virtu- ally all personal health data has some confidentiality risk associated with it, and helps to overcome the mistaken im- pression that the confidentiality problem is solved by remov- ing the 19 specified elements. Most health care information, such as hospital discharge data, can- not be anonymized through deidentification. The reason that remov- ing identifiers does not assure sufficient anonymity of respondents is that, today, a data snooper can get inexpensive access to databases with names attached to records. Marketing and credit information databases and voter registration lists are exemplars. Having this exter- nal information, the data snooper can employ sophisticated, but readily available, record linkage techniques. The resultant attempts to link an identified record from the public database to a deidentified record are often successful (Winkler, 1998). With such a linkage, the record would be reidentified. New Areas of Concern Technological developments continue to raise new issues that must be addressed in the ethical direction of research involving human sub- jects. Of burgeoning importance in recent years are developments in information technology, especially the Internet, and in biotechnology, especially human genetics research. The Internet A good discussion of some of the issues involved in providing re- mote access to data through the web is provided by Blakemore (2001). These include security assurances against hacker attack and fears of

CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 241 record linkage. A prominent example of web access to data is American FactFinder, maintained by the U.S. Census Bureau (http://factfinder. census.gov). American FactFinder provides access to population, hous- ing, economic, and geographic data. The site gives a good description of the elaborate procedures followed to ensure confidentiality through statistical disclosure limitation (see also American Association for the Advance of Science, 1999). Genetic Research The American Society of Human Genetics published the following statement on this issue: Studies that maintain identified or identifiable specimens must maintain subjects’ confidentiality. Information from these samples should not be provided to anyone other than the subjects and persons designated by the subjects in writ- ing. To ensure maximum privacy, it is strongly recommended that investigators apply to the Department of Health and Human Services for a Certificate of Confidentiality. . . . In- vestigators should indicate to the subject that they cannot guarantee absolute confidentiality. A statement by the Health Research Council of New Zealand (1998) is more specific: Researchers must ensure the confidentiality and privacy of stored genetic information, genetic material or results of the research which relate to identified or identifiable par- ticipants. In particular, the research protocol must specify whether genetic information or genetic material and any in- formation derived from studying the genetic material, will be stored in identified, deidentified or anonymous form. Re- searchers should consider carefully the consequences of stor- ing information and material in anonymous form for the proposed research, future research and communication of research results to participants. Researchers should dis- close where storage is to be and to whom their tissues will be accessible. Tissue or DNA should only be sent abroad if this is acceptable to the consenting individual.

242 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH TENSION BETWEEN DISCLOSURE RISK AND DATA UTILITY Data Quality Audit The process of assuring confidentiality through statistical disclosure limitation while maintaining data utility has the following components: • a data quality audit that, beginning with the original, collected data, assesses disclosure risk and data utility; • a determination of adequacy of confidentiality protection; • if confidentiality protection is inadequate, the implementation of a restricted access or restricted data procedure; and • a return to the data quality audit. A quality audit of collected data evaluates the utility of the data and assesses disclosure risk. Typically, with good research design and im- plementation, the data utility is high. But, also, the risk of disclosure through the release of the original, collected data is too high, even when the data collected have been deidentified, i.e., apparent identi- fiers (name, e-mail address, phone number, etc.) have been removed. Reidentification techniques have become too sophisticated to assure confidentiality protection (Winkler, 1998). A confidentiality audit will include identification of (1) sensitive objects and (2) characteristics of the data that make it susceptible to attack. R-U Confidentiality Map A measure of statistical disclosure risk, R, is a numerical assess- ment of the risk of unintended disclosures following dissemination of the data. A measure of data utility, U, is a numerical assessment of the usefulness of the released data for legitimate purposes. Illustrative results using particular specifications for R and U have been devel- oped. The R-U confidentiality map was initially presented by Duncan and Fienberg (1999) and further explored for categorical data by Dun- can et al. (2001). As it is more fully developed by Duncan, Keller- McNulty, and Stokes (2002), the R-U confidentiality map provides a quantified link between R and U directly through the parameters of a disclosure limitation procedure. With an explicit representation of how the parameters of the disclosure limitation procedure affect R and U, the tradeoff between disclosure risk and data utility is apparent. With the R-U confidentiality map, data-holding groups have a workable new tool to frame decision making about data dissemination under confi- dentiality constraints.

CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 243 Restricted Access Procedures Restricted access procedures are administrative controls on who can access data and under what conditions. These controls may in- clude use of sworn agent status, licensing, and secure research sites. Each of these restricted access procedures requires examination of its structure and careful monitoring to ensure that it provides both con- fidentiality protection and appropriate access to data. Licensing sys- tems, for example, require periodic inspections and a tracking database to monitor restricted-use data files (Seastrom, 2001). Even in secure research sites, only restricted data may be made available, say with de- identified data files. Secure sites require a trained staff who can impart a “culture of confidentiality” (Dunne, 2001). Restricted Data Procedures: Disclosure Limitation Methods Restricted data procedures are methods for disclosure limitation that require a disseminated data product to be some transformation of the original data. A variety of disclosure limitation methods have been proposed by researchers on confidentiality protection. Gener- ally, these methods are tailored either to tabular data or to microdata. These procedures are widely applied by government statistical agen- cies since they face confidentiality issues directly in producing data products for their users. The most commonly used methods for tab- ular data are cell suppression based on minimum cell count or dom- inance rules; recoding variables; rounding; and geographic or mini- mum population thresholds. The most commonly used methods for microdata are microaggregation, deletion of data items, deletion of sensitive records, recoding data into broad categories, top and bottom coding, sampling, and geographic or minimum population thresholds (see Fels¨ , Theeuwes, and Wagner, 2001). o Direct transformations of data for confidentiality purposes are called disclosure limiting masks (Jabine, 1993a, 1993b). With masked data sets, there is a specific functional relationship, possibly as a function of multiple records and possibly as a stochastic function, between masked values and the original data. Because of this relationship, the possibil- ities of both identity and attribute disclosures continue to exist, even though the risk of disclosure may be substantially reduced. The idea is to provide a response that, while useful for statistical analysis pur- poses, has sufficiently low disclosure risk. As a general classification, disclosure-limiting masks can be categorized as suppressions, recod- ings, or samplings.

244 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Whether for microdata or tabular data, many of these transforma- tions can be represented as matrix masks (Duncan and Pearson, 1991), M = AXB + C, where X is a data matrix, say n × p. In general, the defining matrices A, B, and C can depend on the values of X and be stochastic. The matrix A (since it operates on the rows of X) is a record- transforming mask, the matrix B (since it operates on the columns of X) is a variable-transforming mask, and the matrix C is a displacing mask (noise addition). Methods for Tabular Data A variety of disclosure limitation methods for tabular data are iden- tified or developed and then analyzed by Duncan et al. (2001). The discussion below tells about some of the more important of these meth- ods. Suppression A suppression is a refusal to provide a data instance. For microdata, this can involve the deletion of all values of some particularly sensitive variable. In principle, certain record values could also be suppressed, but this is usually handled through recoding. For tabular data, the values of table cells that pose confidentiality problems are suppressed. These are the primary suppressions. Often, a cell is considered un- safe for publication according to the (n, p) dominance rule, i.e., if a few (n), say three, contributing entities represent a percentage p, say 70 percent, or more of the total. Additionally, enough other cells are suppressed so that the values of the primary suppressions cannot be inferred from released table margins. These additional cells are called secondary suppressions. Even tables of realistic dimensionality with only a few primary suppressions present a multitude of possible config- urations for the secondary cell suppressions. This raises computational difficulties that can be formulated as combinatorial optimization prob- lems. Typical techniques that are used include mathematical program- ming (especially integer programming) and graph theory (Chowdhury et al., 1999). Recoding A disclosure-limiting mask for recoding creates a set of data for which some or all of the attribute values have been altered. Recoding can be applied to microdata or to tabular data. Some common methods of recoding for tabular data are global recoding and rounding. A new method of recoding is Markov perturbation.

CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 245 • Under global recoding, categories are combined. This represents a coarsening of the data through combining rows or combining columns of the table. • Under rounding, every cell entry is rounded to some base b. The controlled rounding problem is to find some perturbation of the original entries that will satisfy (marginal, typically) constraints and that is “close” to the original entries (Cox, 1987). Multidi- mensional tables present special difficulties. Methods for dealing with them are given by Kelley, Golden, and Assad (1990). • Markov perturbation (Duncan and Fienberg, 1999) makes use of stochastic perturbation through entity moves according to a Markov chain. Because of the cross-classified constraints im- posed by the fixing of marginal totals, moves must be coupled. This coupling is consistent with a Gr¨ bner basis structure (Fien- o berg, Makov, and Steele 1998). In a graphical representation, it is consistent with data flows corresponding to an alternating cycle, as discussed by Cox (1987). Disclosure-Limitation Methods for Microdata Examples of recoding as applied to microdata include data swap- ping; adding noise; and global recoding and local suppression. In data swapping (Dalenius and Reiss, 1982; Reiss, 1980; Spruill, 1983), some fields of a record are swapped with the corresponding fields in an- other record. Concerns have been raised that while data swapping lowers disclosure risk, it may excessively distort the statistical struc- ture of the original data (Adam and Wortmann, 1989). A combina- tion of data swapping with additive noise has been suggested by Fuller (1993). Masking through the introduction of additive or multiplicative noise has been investigated (e.g., Fuller, 1993). A disclosure limitation method for microdata that is used in the µ-Argus software is a combi- nation of global recoding and local suppression. Global recoding com- bines several categories of a variable to form less specific categories. Topcoding is a specific example of global recoding. Local suppression suppresses certain values of individual variables (Willenborg and de Waal, 1996). The aim is to reduce the set of records where only a few agree on particular combinations of key values. Both methods make the data less specific and so result in some information loss to legiti- mate researchers.

246 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Sampling Sampling, as a disclosure-limiting mask, creates an appropriate sta- tistical sample of the original data. Alternatively, if the original data is itself a sample, the data may be considered self-masked. Just the fact that the data are a sample may not result in disclosure risk sufficiently low to permit data dissemination. In that case, subsampling may be required to obtain a data product with adequately low disclosure risk. Synthetic, Virtual, or Model-Based Data The methods described so far have involved perturbations or mask- ing of the original data. These are called data-conditioned methods by Duncan and Fienberg (1999). Another approach, while less studied, should be conceptually familiar to statisticians. Consider the original data to be a realization according to some statistical model. Replace the original data with samples (the synthetic data) according to the model. Synthetic data sets consist of records of individual synthetic units rather than records the agency holds for actual units. Rubin (1993) suggested synthetic data construction through a mul- tiple imputation method. The effect of imputation of an entire micro- data set on data utility is an open research question. Rubin (1993) asserts that the risk of identity disclosure can be eliminated through the dissemination of synthetic data and proposes the release of syn- thetic microdata sets for public use. His reasoning is that the synthetic data carries no direct functional link between the original data and the disseminated data. So while there can be substantial identity dis- closure risk with (inadequately) masked data, identity disclosure is, in a strict sense, impossible with the release of synthetic data. However, the release of synthetic data may still involve risk of attribute disclosure (Fienberg, Makov, and Steele 1998). Rubin (1993) cogently argues that the release of synthetic data has advantages over other data dissemination strategies, because • masked data can require special software for its proper analysis for each combination of analysis, masking method, and database type (Fuller, 1993); • release of aggregates, e.g., summary statistics or tables, is inad- equate due of the difficulty in contemplating at the data release stage what analysts might like to do with the data; and • mechanisms for the release of microdata under restricted access conditions, e.g., user-specific administrative controls, can never fully satisfy the demands for publicly available microdata.

CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 247 The methodology for the release of synthetic data is simple in con- cept, but complex in implementation. Conceptually, the data-holding research group would use the original data to determine a model to generate the synthetic data. But the purpose of this model is not the usual prediction, control, or scientific understanding that argues for parsimony through Occam’s Razor. Instead, its purpose is to gener- ate synthetic data useful to a wide range of users. The agency must recognize uncertainty in both model form and the values of model pa- rameters. This argues for the relevance of hierarchical and mixture models to generate the synthetic data. CONCLUSIONS IRBs must examine protocols for human subjects research carefully to ensure that both confidentiality protection is afforded and that ap- propriate data access is afforded. Promising procedures are available based on restricted access, through means such as licensing and se- cure research sites, and restricted data, through statistical disclosure limitation. REFERENCES AND BIBLIOGRAPHY Abowd, J.M., and S.D. Woodcock 2001 Disclosure limitation in longitudinal linked data. Pp. 215-277 in Confiden- tiality, Disclosure, and Data Access: Theory and Practical Applications for Sta- tistical Agencies, P Doyle, J.I. Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds. . Amsterdam: North-Holland/Elsevier. Adam, N.R., and J.C. Wortmann 1989 Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21:515-556. Agarwal, R., and R. Srikant 2000 Privacy-preserving data mining. Proceedings of the 2000 ACM SIGMOD on Management of Data, May 15-18, Dallas, Tex. American Association for the Advancement of Science 1999 Ethical and Legal Aspects of Human Subjects Research on the Internet. Work- shop Report. Available: http://www.aaas.org/spp/dspp/sfrl/projects/intres/report. pdf [4/12/02]. American Medical Informatics Association 2000 Letter to the U.S. Department of Health and Human Services. Available: http: //www.amia.org/resource/policy/nprm response.html [4/1/03]. Blakemore, M. 2001 The potential and perils of remote access. Pp. 315-340 in Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, P Doyle, J.I. Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds. Amster- . dam: North-Holland/Elsevier.

248 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Chowdhury, S.D., G.T. Duncan, R. Krishnan, S.F. Roehrig, and S. Mukherjee 1999 Disclosure detection in multivariate categorical databases: Auditing confiden- tiality protection through two new matrix operators. Management Science 45:1710-1723. Cox, L.H. 1980 Suppression methodology and statistical disclosure control. Journal of the American Statistical Association 75:377-385. 1987 A constructive procedure for unbiased controlled rounding. Journal of the American Statistical Association 82:38-45. Dalenius, T. 1986 Finding a needle in a haystack. Journal of Official Statistics 2:329-336. 1988 Controlling Invasion of Privacy in Surveys. Department of Development and Research. Statistics Sweden. Dalenius, T., and S.P Reiss . 1982 Data-swapping: A technique for disclosure control. Journal of Statistical Plan- ning and Inference 6:73-85. De Waal, A.G., and L.C.R.G. Willenborg 1996 A view on statistical disclosure for microdata. Survey Methodology 22:95-103. Domingo-Ferrer, J. and V. Torra 2001 A quantitative comparison of disclosure control methods for microdata. Pp. 111-134 in Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, P Doyle, J.I. Lane, J.J.M. Theeuwes, and . L.V. Zayatz, eds. Amsterdam: North-Holland/Elsevier. Duncan, G.T. 2001 Confidentiality and statistical disclosure limitation. In N.J. Smelser and P .B. Baltes, eds., International Encyclopedia of the Social and Behavioral Sciences. Oxford, England: Elsevier Science. Duncan, G.T., and S.E. Fienberg 1999 Obtaining information while preserving privacy: A Markov perturbation method for tabular data. Eurostat. Statistical Data Protection ’98 Lisbon 351- 362. Duncan, G.T., and S. Kaufman 1996 Who should manage information and privacy conflicts?: Institutional design for third-party mechanisms. The International Journal of Conflict Management 7:21-44. Duncan, G.T., and D. Lambert 1986 Disclosure-limited data dissemination (with discussion). Journal of the Amer- ican Statistical Association 81:10-28. 1989 The risk of disclosure of microdata. Journal of Business and Economic Statis- tics 7:207-217. Duncan, G.T., and S. Mukherjee 2000 Optimal disclosure limitation strategy in statistical databases: Deterring tracker attacks through additive noise. Journal of the American Statistical Association 95:720-729. Duncan, G.T., and R. Pearson 1991 Enhancing access to microdata while protecting confidentiality: Prospects for the future (with discussion). Statistical Science 6:219-239. Duncan, G.T., S.E. Fienberg, R. Krishnan, R. Padman, and S.F. Roehrig 2001 Disclosure limitation methods and information loss for tabular data. Pp. 135- 166 in Confidentiality, Disclosure, and Data Access: Theory and Practical Ap- plications for Statistical Agencies, P Doyle, J.I. Lane, J.J.M. Theeuwes, and . L.V. Zayatz, eds. Amsterdam: North-Holland/Elsevier.

CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 249 Duncan, G.T., S. Keller-Mcnulty, and S.L. Stokes 2002 Disclosure risk vs. data utility: The R-U confidentiality map. Technical Re- ports: Statistical Sciences Group, Los Alamos National Laboratory and Heinz School of Public Policy and Management, Carnegie Mellon University. Dunne, T. 2001 Issues in the establishment and management of secure research sites. Pp. 297-314 in Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, P Doyle, J.I. Lane, J.J.M. Theeuwes, and . L.V. Zayatz, eds. Amsterdam: North-Holland/Elsevier. Elliot, M. 2001 Disclosure risk assessment. Pp. 135-166 in Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, P Doyle, . J.I. Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds. Amsterdam: North-Holland/ Elsevier. Elliot, M., and A. Dale 1999 Scenarios of attack: The data intruder’s perspective on statistical disclosure risk. Netherlands Official Statistics 14:6-10. Elliot, M., C. Skinner, and A. Dale 1998 Special uniques, random uniques and sticky populations: Some counterin- tuitive effects of geographical detail on disclosure risk. Research in Official Statistics 1:53-68. Eurostat 1996 Manual on Disclosure Control Methods. Luxembourg: Office for Publications of the European Communities. Federal Committee on Statistical Methodology 1994 Statistical Policy Working Paper 22: Report on Statistical Disclosure Limita- tion Methodology. Washington, DC: U.S. Office of Management and Budget. Fels¨ , F., J. Theeuwes, and G. Wagner o 2001 Disclosure limitation methods in use: Results of a survey. Pp. 17-42 in Confi- dentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, P Doyle, J.I. Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds. . Amsterdam: North-Holland/Elsevier. Fienberg, S.E. 1994 Conflicts between the needs for access to statistical information and demands for confidentiality. Journal of Official Statistics 10:115-132. Fienberg, S.E., U.E. Makov, and R.J. Steele 1998 Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics 14:347-360. Fuller, W.A. 1993 Masking procedures for microdata disclosure limitation. Journal of Official Statistics 9:383-406. Greenberg, B. 1990 Disclosure avoidance research at the Census Bureau. Pp. 144-166 in Pro- ceedings of the U.S. Census Bureau Annual Research Conference, Washington, DC. Greenberg, B. and L. Zayatz 1992 Strategies for measuring risk in public use microdata files. Statistica Neer- landica 46:33-48. Health Research Council of New Zealand 1998 Statement. Available: http://www.hrc.govt.nz/genethic.htm. Jabine, T.B. 1993a Procedures for restricted data access. Journal of Official Statistics 9:537-589. 1993b Statistical disclosure limitation practices of United States statistical agencies. Journal of Official Statistics 9:427-454.

250 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH Kelley, J., B. Golden, and A. Assad 1990 Controlled rounding of tabular data. Operations Research 38:760-772. Kim, J.J. 1986 A method for limiting disclosure in microdata based on random noise and transformation. Pp. 370-374 in Proceedings of the Survey Research Methods Section, American Statistical Association. Kim, J.J., and W. Winkler 1995 Masking microdata files. In Proceedings of the Section on Survey Research Methods, American Statistical Association. Kooiman, P J. Nobel, and L. Willenborg ., 1999 Statistical data protection at Statistics Netherlands. Netherlands Official Statis- tics 14:21-25. Lambert, D. 1993 Measures of disclosure risk and harm. Journal of Official Statistics 9:313-331. Little, R.J.A. 1993 Statistical analysis of masked data. Journal of Official Statistics 9:407-426. Marsh, C., C. Skinner, S. Arber, B. Penhale, S. Openshaw, J. Hobcraft, D. Lievesley, and N. Walford 1991 The case for samples of anonymized records from the 1991 census. Journal of the Royal Statistical Society, Series A 154:305-340. Marsh, C., A. Dale, and C.J. Skinner 1994 Safe data versus safe settings: Access to microdata from the British Census. International Statistical Review 62:35-53. Mokken, R.J., P Kooiman, J. Pannekoek, and L.C.R.J. Willenborg . 1992 Disclosure risks for microdata. Statistica Neerlandica 46:49-67. Mood, A.M., F.A. Graybill, and D.C. Boes 1963 Introduction to the Theory of Statistics. New York: McGraw-Hill. Moore, R.A. 1996 Controlled data-swapping techniques for masking public use microdata sets. Statistical Research Division Report Series, RR 96-04. Washington, DC: U.S. Bureau of the Census. National Research Council 1993 Private Lives and Public Policies: Confidentiality and Accessibility of Govern- ment Statistics. Panel on Confidentiality and Data Access, G.T. Duncan, T.B. Jabine, and V.A. de Wolf, eds. Committee on National Statistics and Social Science Research Council. Washington, DC: National Academy Press. 2000 Improving Access to and Confidentiality of Research Data: Report of a Workshop Committee on National Statistics, C. Mackie and N. Bradburn, eds. Washing- ton, DC: National Academy Press. Paass, G. 1988 Disclosure risk and disclosure avoidance for microdata. Journal of Business and Economic Statistics 6:487-500. Rubin, D.B. 1993 Satisfying confidentiality constraints through the use of synthetic multiply- imputed microdata. Journal of Official Statistics 9:461-468. Seastrom, M.M. 2001 Licensing. Pp. 279-296 in Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, P Doyle, J.I. Lane, J.J.M. . Theeuwes, and L.V. Zayatz, eds. Amsterdam: North-Holland/ Elsevier. Skinner, C.J. 1990 Statistical Disclosure Issues for Census Microdata. Paper presented at Interna- tional Symposium on Statistical Disclosure Avoidance, Voorburg, The Nether- lands, December 13.

CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 251 Spruill, N.L. 1983 The confidentiality and analytic usefulness of masked business microdata. Pp. 602-607 in Proceedings of the Section on Survey Research Methods, American Statistical Association. Sweeney, L. 2001 Information explosion. Pp. 43-74 in Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, P Doyle, J.I. . Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds. Amsterdam: North-Holland/ Elsevier. Willenborg, L., and T. de Waal 1996 Statistical Disclosure Control in Practice. Lecture Notes in Statistics #111. New York: Springer. Winkler, W.E. 1998 Re-identification methods for evaluating the confidentiality of analytically valid microdata. Research in Official Statistics 1:87-104. Zayatz, L.V., P Massell, and P Steel . . 1999 Disclosure limitation practices and research at the U.S. Census Bureau. Nether- lands Official Statistics 14:26-29.

Next: Biographical Sketches of Panel Members and Staff »
Protecting Participants and Facilitating Social and Behavioral Sciences Research Get This Book
×
Buy Paperback | $39.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Institutional review boards (IRBs) are the linchpins of the protection systems that govern human participation in research. In recent years, high-profile cases have focused attention on the weaknesses of the procedures for protecting participants in medical research. The issues surrounding participants protection in the social, behavioral, and economic sciences may be less visible to the public eye, but they are no less important in ensuring ethical and responsible research.

This report examines three key issues related to human participation in social, behavioral, and economic sciences research: (1) obtaining informed, voluntary consent from prospective participants: (2) guaranteeing the confidentiality of information collected from participants, which is a particularly challenging problem in social sciences research; and (3) using appropriate review procedures for “minimal-risk” research.

Protecting Participants and Facilitating Social and Behavioral Sciences Research will be important to policy makers, research administrators, research sponsors, IRB members, and investigators. More generally, it contains important information for all who want to ensure the best protection—for participants and researchers alike—in the social, behavioral, and economic sciences.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!