| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 235
Confidentiality and Data Access
Issues for Institutional Review
Boards
George T. Duncan
Carnegie Mellon University
INS RODUCTION
ACCEPTED PRINCIPLES of information ethics (see National Re-
search Council, 1993) require that promises of confidentiality
be preserved and that the data collected in surveys and stud-
ies adequately serve their purposes. A compromise of the confiden-
tiality pledge could harm the research organization, the subject, or the
funding organization. A statistical disclosure occurs when the data dis-
semination allows data snoopers to gain information about subjects by
which the snooper can isolate individual respondents and correspond-
ing sensitive attribute values (Duncan and Lambert, 1989; Lambert,
19931. Policies and procedures are needed to reconcile the need for
confidentiality and the demand for data (Dalenius, 19881.
Under a body of regulation known as the Federal Policy for the Pro-
tection of Human Subjects, the National Institutes of Health Office of
Human Subjects Research (OHSR) mandates that institutional review
boards (IRBs) determine that research protocols assure the privacy
and confidentiality of subjects. Specifically, it requires IRBs to ascer-
tain whether (a) personally identifiable research data will be protected
to the extent possible from access or use and (b) any special privacy and
confidentiality issues are properly addressed, e.g., use of genetic infor-
mation. This standard directs an IRB's attention, but without elabora-
tion and clarification it does not provide IRBs with operational crite-
ria for evaluation of research protocols. Nor does it provide guidance
to researchers in how to establish research protocols that can merit
IRB approval. The Office for Human Research Protection (OHRP) is
responsible for interpreting and overseeing implementation of the reg-
235
OCR for page 236
236 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH
ulations regarding the Protection of Human Subjects (45 CFR 46) pro-
mulgated by the Department of Health and Human Services (DlIHS).
OHRP is responsible for providing guidance to researchers and IRBs
on ethical issues in biomedical and behavioral research.
As IRBs respond to their directive to ethically oversee the burgeon-
ing research on human subjects, they require systematic ways of ex-
amining protocols for compliance with best practice for confidentiality
and data access. Clearly, the task of an IRB is lightened if researchers
are fully aware of such practices and how they can be implemented.
This paper identifies key confidentiality and data access issues that
IRB members must consider when reviewing protocols. It provides
both a conceptual framework for such reviews and a discussion of a
variety of administrative procedures and technical methods that can be
used by researchers to simultaneously assure confidentiality protection
and appropriate access to data.
CRITICAL ISSUES
Reason for Concern
Most generally, an ethical perspective requires researchers to maxi-
mize the benefits of their research while minimizing the risk and harm
to their subjects. This beneficence notion is often interpreted that, first,
"one ought not to inflict harm" and, second, that "one ought to do or
promote good." In the context of assuring data quality from research
studies, this means first assuring an adequate degree of confidentiality
protection and then maximizing the value of the data generated by the
research. Confidentiality is afforded for reasons of ethical treatment of
research subjects, pragmatic grounds of assuring subject cooperation,
and, in some cases, legal requirements.
Aspects of Concern
Data have a serious risk of disclosure when (a) disclosure would
have negative consequences, (b) a data snooper is motivated both psy-
chologically and pragmatically- to seek disclosure (Elliot, 2001), and
(c) the data are vulnerable to disclosure attack. Based on its confiden-
tiality pledges, researchers must protect certain sensitive objects from
a data snooper. Sensitive objects can be any of a variety of variables
associated with a subject entity (person, household, enterprise, etc.~.
Examples include the values of numerical variables, such as household
income, an X-ray of a patient's lung, and a subject's report of their sex-
OCR for page 237
CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 23 7
ual history. Data with particular characteristics pose substantial risk
of disclosure and suggest vulnerability:
· geographical detail—census block (Elliot, Skinner, and Dale, 1998;
Greenberg and Zayatz, 19921;
· longitudinal or panel structure—criminal histories (Abowd and
Woodcock, 20011;
· outliers, likely unique in the population such as a 16-year-old
widow (Dalenius, 1986; Greenberg, 19901;
· attributes with high level of detail income to the nearest dollar
(Elliot, 2001);
· many attribute variables such as medical record (Sweeney, 20011;
.
population data, as in a census, rather than a survey with small
sampling fraction (Elliot, 20011;
· databases that are publicly available, identified, and share indi-
vidual respondents and attribute variables (key variables- Elliot
and Dale, 1999) with the subject data marketing and credit data-
bases.
Data with geographical detail, such as census tract data, may be
easily linked to known characteristics of respondents. Concern for this
suggests placing minimum population levels for geographical identi-
fiers. For particular geographical regions, this can mean specifying
the minimum size of a region that can be reported. Longitudinal data,
which tracks entities over time, also poses substantial disclosure risk.
Many individuals had coronary bypass surgery in the Chicago area in
1998 and many had bypass surgery in Phoenix in 1999, but few did
both. Outliers, say on variables like weight, height, or cholesterol level
can lead to identifiable respondents. Data with many attribute vari-
ables allow easier linkage with known attributes of identified entities,
and entities, which are unique in the sample, are more likely to be
unique in the population. Population data pose more disclosure risk
than data from a survey having a small sampling fraction. Finally, spe-
cial concern must be shown when other databases are available to the
data snooper and these databases are both identified and share with
the subject data both individual respondents and certain attribute vari-
ables. Record linkage may then be possible between the subject data
and the external database. The shared attribute variables provide the
key.
OCR for page 238
238 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH
Disclosure
The legitimate objects of inquiry for research involving human sub-
jects are statistical aggregates over the records of individuals, for ex-
ample, the median number of serious infections sustained by patients
receiving a drug for treatment of arthritis. The investigators seek to
provide the research community with data that will allow accurate in-
ference about such population characteristics. At the same time, to
respect confidentiality, the investigators must thwart the data snooper
who might seek to use the disseminated data to draw accurate infer-
ences about, say, the infection history of a particular patient. Such a
capability by a data snooper would constitute a statistical disclosure.
There are two major types of disclosure identity disclosure and
attribute disclosure. Identity disclosure occurs with the association of
a respondent's identity and a disseminated data record (Paass, 1988;
Spruill, 1983; Strudler et al., 19861. Attribute disclosure occurs with
the association of either an attribute value in the disseminated data
or an estimated attribute value based on the disseminated data with
the respondent (Duncan and Lambert, 1989; Lambert, 1993~. In the
case of identity disclosure, the association is assumed exact. In the
case of attribute disclosure, the association can be approximate. Many
investigators emphasize limiting the risk of identity disclosure, perhaps
because of its substantial equivalence to the inadvertent release of an
identified record. An attribute disclosure, even though it invades the
privacy of a respondent, may not be so easily traceable to the actions
of an agency. An IRE in its oversight capacity should be concerned that
investigators limit the risk of both attribute and identity disclosures.
Risk of Disclosure
Measures of disclosure risk are required (Elliot, 20011. In the con-
text of identity disclosure, disclosure risk can arise because a data
snooper may be able to use the disseminated data product to reiden-
tify some deidentified records. Spruill (1983) proposed a measure of
disclosure risk for microdata: (1) for each "test" record in the masked
file, compute the Euclidean distance between the test record and each
record in the source file; (2) determine the percentage of test records
that are closer to their parent source record than to any other source
record. She defines the risk of disclosure to be the percentage of test
records that match the correct parent record multiplied by the sam-
pling fraction (fraction of source records released).
More generally, and consistent with Duncan and Lambert (1986,
1989), an agency will have succeeded in protecting the confidential-
OCR for page 239
CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 23 9
ity of a released data product if the data snooper remains sufficiently
uncertain about a protected target value after data release. From this
perspective, a measure of disclosure risk is built on measures of uncer-
tainty. Furthermore, an agency can model the decision making of the
data snooper as a basis for using disclosure limitation to deter infer-
ences about a target. Data snoopers are deterred from publicly making
inferences about a target when their uncertainty is sufficiently high.
Mathematically, uncertainty functions provide a workable framework
for this analysis. Examples include Shannon entropy, which has found
use in categorizing continuous microdata and coarsening of categor-
ical data (Domingo-Ferrer and Torra, 2001; Willenborg and de Waal,
1996:138).
Generally, a data snooper has a priori knowledge about a target, of-
ten in the form of a database with identified records (Adam and Wort-
mann, 19891. Certain variables may be in common with the subject
database. These variables are called key or identifying (De Waal and
Willenborg, 1996; Elliot, 20011. When a single record matches on the
key variables, the data snooper has a candidate record for identifica-
tion. That candidacy is promoted to an actual identification if the data
snooper is convinced that the individual is in the target database. This
would be the case either if the data snooper has auxiliary information
to that effect or if the data snooper is convinced that the individual is
unique in the population. The data snooper may find from certain key
variables that a sample record is unique. The question then is whether
the individual is also unique on these key variables in the population.
Bethlehem, Keller, and Pannekoek (1990) have examined detection of
records agreeing on simple combinations of keys based on discrete
variables in the files. Record linkage methodologies have been exam-
ined by Domingo-Ferrer and Torra (2001), Fuller (1993), and Winkler
(1998~.
Deidentification
Deidentification of data is the process of removing apparent iden-
tifiers (name, e-mail address, social security number, phone number,
address, etc.) from a data record. Deidentification does not necessar-
ily make a record anonymous, as it may well be possible to reidentify
the record using external information. In a letter to DHHS, the Amer-
ican Medical Informatics Association (2000) noted:
However, in discussions with a broad range of healthcare
stakeholders, we have found the concept of "deidentified
information" can be misleading, for it implies that if the
OCR for page 240
240 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH
19 data elements are removed, the problem of reidentifica-
tion has been solved. The information security literature
suggests otherwise. Additionally, with the continuing and
dramatic increase in computer power that is ubiquitously
available, personal health data items that currently would
be considered 'anonymous' may lend themselves to increas-
ingly easy reidentification in the future. For these reasons,
we believe the regulations would be better served by adopt-
ing the conventions of personal health data as being of "High
Reidentification Potential" (e.g., the 19 data elements listed
in the current draft), and "Low Reidentification Potential."
Over time, some elements currently considered of low po-
tential may migrate to the high potential classification. More
importantly, this terminology conveys the reality that virtu-
ally all personal health data has some confidentiality risk
associated with it, and helps to overcome the mistaken im-
pression that the confidentiality problem is solved by remov-
ing the 19 specified elements.
Most health care information, such as hospital discharge data, can-
not be anonymized through Reidentification. The reason that remov-
ing identifiers does not assure sufficient anonymity of respondents is
that, today, a data snooper can get inexpensive access to databases
with names attached to records. Marketing and credit information
databases and voter registration lists are exemplars. Having this exter-
nal information, the data snooper can employ sophisticated, but readily
available, record linkage techniques. The resultant attempts to link an
identified record from the public database to a deidentified record are
often successful (Winkler, 19981. With such a linkage, the record would
be reidentified.
New Areas of Concern
Technological developments continue to raise new issues that must
be addressed in the ethical direction of research involving human sub-
jects. Of burgeoning importance in recent years are developments in
information technology, especially the Internet, and in biotechnology,
especially human genetics research.
The Internet
A good discussion of some of the issues involved in providing re-
mote access to data through the web is provided by Blakemore (20011.
These include security assurances against hacker attack and fears of
OCR for page 241
CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 241
record linkage. A prominent example of web access to data is American
FactFinder, maintained by the U.S. Census Bureau (http://factfinder.
census.gov). American FactFinder provides access to population, hous-
ing, economic, and geographic data. The site gives a good description
of the elaborate procedures followed to ensure confidentiality through
statistical disclosure limitation (see also American Association for the
Advance of Science, 19991.
Genetic Research
The American Society of Human Genetics published the following
statement on this issue:
Studies that maintain identified or identifiable specimens
must maintain subjects' confidentiality. Information from
these samples should not be provided to anyone other than
the subjects and persons designated by the subjects in writ-
ing. To ensure maximum privacy, it is strongly recommended
that investigators apply to the Department of Health and
Human Services for a Certificate of Confidentiality.... In-
vestigators should indicate to the subject that they cannot
guarantee absolute confidentiality.
A statement by the Health Research Council of New Zealand (1998) is
·,
more specific:
Researchers must ensure the confidentiality and privacy of
stored genetic information, genetic material or results of
the research which relate to identified or identifiable par-
ticipants. In particular, the research protocol must specify
whether genetic information or genetic material and any in-
formation derived from studying the genetic material, will
be stored in identified, deidentified or anonymous form. Re-
searchers should consider carefully the consequences of stor-
ing information and material in anonymous form for the
proposed research, future research and communication of
research results to participants. Researchers should dis-
close where storage is to be and to whom their tissues will
be accessible. Tissue or DNA should only be sent abroad if
this is acceptable to the consenting individual.
OCR for page 242
242 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH
TENSION BETWEEN DISCLOSURE RISK AND DATA UTILITY
Data Quality Audit
The process of assuring confidentiality through statistical disclosure
limitation while maintaining data utility has the following components:
· a data quality audit that, beginning with the original, collected
data, assesses disclosure risk and data utility;
· a determination of adequacy of confidentiality protection;
· if confidentiality protection is inadequate, the implementation of
a restricted access or restricted data procedure; and
· a return to the data quality audit.
A quality audit of collected data evaluates the utility of the data and
assesses disclosure risk. Typically, with good research design and im-
plementation, the data utility is high. But, also, the risk of disclosure
through the release of the original, collected data is too high, even
when the data collected have been deidentified, i.e., apparent identi-
fiers (name, e-mail address, phone number, etc.) have been removed.
Reidentification techniques have become too sophisticated to assure
confidentiality protection (Winkler, 1998~. A confidentiality audit will
include identification of (1) sensitive objects and (2) characteristics of
the data that make it susceptible to attack.
R-U Confidentiality Map
A measure of statistical disclosure risk, R. is a numerical assess-
ment of the risk of unintended disclosures following dissemination of
the data. A measure of data utility, U. is a numerical assessment of
the usefulness of the released data for legitimate purposes. Illustrative
results using particular specifications for R and U have been devel-
oped. The R-U confidentiality map was initially presented by Duncan
and Fienberg (1999) and further explored for categorical data by Dun-
can et al. (20011. As it is more fully developed by Duncan, Keller-
McNulty, and Stokes (2002), the R-U confidentiality map provides a
quantified link between R and U directly through the parameters of a
disclosure limitation procedure. With an explicit representation of how
the parameters of the disclosure limitation procedure affect R and U.
the tradeoff between disclosure risk and data utility is apparent. With
the R-U confidentiality map, data-holding groups have a workable new
tool to frame decision making about data dissemination under confi-
dentiality constraints.
OCR for page 243
CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTION REVIEW BOARS 243
Restricted Access Procedures
Restricted access procedures are administrative controls on who
can access data and under what conditions. These controls may in-
clude use of sworn agent status, licensing, and secure research sites.
Each of these restricted access procedures requires examination of its
structure and careful monitoring to ensure that it provides both con-
fidentiality protection and appropriate access to data. Licensing sys-
tems, for example, require periodic inspections and a tracking database
to monitor restricted-use data files (Seastrom, 2001~. Even in secure
research sites, only restricted data may be made available, say with de-
identified data files. Secure sites require a trained staff who can impart
a "culture of confidentiality" (Dunne, 20011.
Restricted Data Procedures: Disclosure Limitation Methods
Restricted data procedures are methods for disclosure limitation
that require a disseminated data product to be some transformation
of the original data. A variety of disclosure limitation methods have
been proposed by researchers on confidentiality protection. Gener-
ally, these methods are tailored either to tabular data or to microdata.
These procedures are widely applied by government statistical agen-
cies since they face confidentiality issues directly in producing data
products for their users. The most commonly used methods for tab-
ular data are cell suppression based on minimum cell count or dom-
inance rules; recoding variables; rounding; and geographic or mini-
mum population thresholds. The most commonly used methods for
microdata are microaggregation, deletion of data items, deletion of
sensitive records, recoding data into broad categories, top and bottom
coding, sampling, and geographic or minimum population thresholds
(see Felso, Theeuwes, and Wagner, 20011.
Direct transformations of data for confidentiality purposes are called
disclosure limiting masics (Jabine, 1993a, 1993b). With masked data
sets, there is a specific functional relationship, possibly as a function of
multiple records and possibly as a stochastic function, between masked
values and the original data. Because of this relationship, the possibil-
ities of both identity and attribute disclosures continue to exist, even
though the risk of disclosure may be substantially reduced. The idea
is to provide a response that, while useful for statistical analysis pur-
poses, has sufficiently low disclosure risk. As a general classification,
disclosure-limiting masks can be categorized as suppressions, recod-
. . .
1ngs, or samplings.
OCR for page 244
244 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL ED BEHAVIOR SCIENCES SEARCH
Whether for microdata or tabular data, many of these transforma-
tions can be represented as matrix masks (Duncan and Pearson, 1991),
M = AXB + C, where X is a data matrix, say n x p. In general, the
defining matrices A, B. and C can depend on the values of X and be
stochastic. The matrix A (since it operates on the rows of X) is a record-
transforming mask, the matrix B (since it operates on the columns of
X) is a variable-transforming mask, and the matrix C is a displacing
mask (noise addition).
Methods for Tabular Data
A variety of disclosure limitation methods for tabular data are iden-
tified or developed and then analyzed by Duncan et al. (20011. The
discussion below tells about some of the more important of these meth-
ods.
Suppression
A suppression is a refusal to provide a data instance. For microdata,
this can involve the deletion of all values of some particularly sensitive
variable. In principle, certain record values could also be suppressed,
but this is usually handled through recoding. For tabular data, the
values of table cells that pose confidentiality problems are suppressed.
These are the primary suppressions. Often, a cell is considered un-
safe for publication according to the (n, p) dominance rule, i.e., if a
few (n), say three, contributing entities represent a percentage p, say
70 percent, or more of the total. Additionally, enough other cells are
suppressed so that the values of the primary suppressions cannot be
inferred from released table margins. These additional cells are called
secondary suppressions. Even tables of realistic dimensionality with
only a few primary suppressions present a multitude of possible config-
urations for the secondary cell suppressions. This raises computational
difficulties that can be formulated as combinatorial optimization prob-
lems. Typical techniques that are used include mathematical program-
ming (especially integer programming) and graph theory (Chowdhury
et al., 19991.
Recoding
A disclosure-limiting mask for recoding creates a set of data for
which some or all of the attribute values have been altered. Recoding
can be applied to microdata or to tabular data. Some common methods
of recoding for tabular data are global recoding and rounding. A new
method of recoding is Markov perturbation.
OCR for page 245
CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 245
Under global recoding, categories are combined. This represents
a coarsening of the data through combining rows or combining
columns of the table.
· Under rounding, every cell entry is rounded to some base b. The
controlled rounding problem is to find some perturbation of the
original entries that will satisfy (marginal, typically) constraints
and that is "close" to the original entries (Cox, 19871. Multidi-
mensional tables present special difficulties. Methods for dealing
with them are given by Kelley, Golden, and Assad (19901.
.
Markov perturbation (Duncan and Fienberg, 1999) makes use of
stochastic perturbation through entity moves according to a
Markov chain. Because of the cross-classified constraints im-
posed by the fixing of marginal totals, moves must be coupled.
This coupling is consistent with a Grobner basis structure (Fien-
berg, Makov, and Steele 19981. In a graphical representation, it is
consistent with data flows corresponding to an alternating cycle,
as discussed by Cox (1987~.
Disclosure-Limitation Methods for Microdata
Examples of recoding as applied to microdata include data swap-
ping; adding noise; and global recoding and local suppression. In data
swapping (Dalenius and Reiss, 1982; Reiss, 1980; Spruill, 1983), some
fields of a record are swapped with the corresponding fields in an-
other record. Concerns have been raised that while data swapping
lowers disclosure risk, it may excessively distort the statistical struc-
ture of the original data (Adam and Wortmann, 19891. A combina-
tion of data swapping with additive noise has been suggested by Fuller
(1993~. Masking through the introduction of additive or multiplicative
noise has been investigated (e.g., Fuller, 1993~. A disclosure limitation
method for microdata that is used in the,u-Argus software is a combi-
nation of global recoding and local suppression. Global recoding com-
bines several categories of a variable to form less specific categories.
Topcoding is a specific example of global recoding. Local suppression
suppresses certain values of individual variables (Willenborg and de
Waal, 19961. The aim is to reduce the set of records where only a few
agree on particular combinations of key values. Both methods make
the data less specific and so result in some information loss to legiti-
mate researchers.
OCR for page 246
246 PROTECTING PARTICIPANTS AlID FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH
Sampling
Sampling, as a disclosure-limiting mask, creates an appropriate sta-
tistical sample of the original data. Alternatively, if the original data is
itself a sample, the data may be considered self-masked. Just the fact
that the data are a sample may not result in disclosure risk sufficiently
low to permit data dissemination. In that case, subsampling may be
required to obtain a data product with adequately low disclosure risk.
Synthetic, Virtual, or Model-Based Data
The methods described so far have involved perturbations or mask-
ing of the original data. These are called data-conditioned methods by
Duncan and Fienberg (1999~. Another approach, while less studied,
should be conceptually familiar to statisticians. Consider the original
data to be a realization according to some statistical model. Replace
the original data with samples (the synthetic data) according to the
model. Synthetic data sets consist of records of individual synthetic
units rather than records the agency holds for actual units.
Rubin (1993) suggested synthetic data construction through a mul-
tiple imputation method. The effect of imputation of an entire micro-
data set on data utility is an open research question. Rubin (1993)
asserts that the risk of identity disclosure can be eliminated through
the dissemination of synthetic data and proposes the release of syn-
thetic microdata sets for public use. His reasoning is that the synthetic
data carries no direct functional link between the original data and
the disseminated data. So while there can be substantial identity dis-
closure risk with (inadequately) masked data, identity disclosure is, in
a strict sense, impossible with the release of synthetic data. However,
the release of synthetic data may still involve risk of attribute disclosure
(Fienberg, Makov, and Steele 19981.
Rubin (1993) cogently argues that the release of synthetic data has
advantages over other data dissemination strategies, because
masked data can require special software for its proper analysis
for each combination of analysis, masking method, and database
type (Fuller, 19931;
release of aggregates, e.g., summary statistics or tables, is inad-
equate due of the difficulty in contemplating at the data release
stage what analysts might like to do with the data; and
· mechanisms for the release of microdata under restricted access
conditions, e.g., user-specific administrative controls, can never
fully satisfy the demands for publicly available microdata.
OCR for page 247
CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 247
The methodology for the release of synthetic data is simple in con-
cept, but complex in implementation. Conceptually, the data-holding
research group would use the original data to determine a model to
generate the synthetic data. But the purpose of this model is not the
usual prediction, control, or scientific understanding that argues for
parsimony through Occam's Razor. Instead, its purpose is to gener-
ate synthetic data useful to a wide range of users. The agency must
recognize uncertainty in both model form and the values of model pa-
rameters. This argues for the relevance of hierarchical and mixture
models to generate the synthetic data.
CONCLUSIONS
IRBs must examine protocols for human subjects research carefully
to ensure that both confidentiality protection is afforded and that ap-
propriate data access is afforded. Promising procedures are available
based on restricted access, through means such as licensing and se-
cure research sites, and restricted data, through statistical disclosure
limitation.
REFERENCES AND BIBLIOGRAPHY
Abowd, J.M., and S.D. Woodcock
2001 Disclosure limitation in longitudinal linked data. Pp. 215-277 in Confiden-
tiality, Disclosure, and Data Access: Theory and Practical Applications for Sta-
tistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds.
Amsterdam: North-Holland/Elsevier.
Adam, N.R., and J.C. Wortmann
1989 Security-control methods for statistical databases: A comparative study. ACM
Computing Surveys 21:515-556.
Agarwal, R., and R. Srikant
2000 Privacy-preserving data mining. Proceedings of the 2000 ACM SIGMOD on
Management of Data, May 15-18, Dallas, Tex.
American Association for the Advancement of Science
1999 Ethical and Legal Aspects of Human Subjects Research on the Internet. Work-
shop Report. Available: http://www.aaas.orgtspp/dspp/sfrVprojects/intres/report.
pdf 4/12/02].
American Medical Informatics Association
2000 Letter to the U.S. Department of Health and Human Services. Available: http:
//w~w.amia.org/resource/policy/nprm response.html [4/1/03].
Blakemore, M.
2001 The potential and perils of remote access. Pp. 315-340 in Confidentiality,
Disclosure, and Data Access: Theory and Practical Applications for Statistical
Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds. Amster-
dam: North-Holland/Elsevier.
OCR for page 248
248 PROTECTING PARTICIPANTS AND FACILITATING SOCIAL AND BEHAVIORAL SCIENCES RESEARCH
Chowdhury, S.D., G.T. Duncan, R. Krishnan, S.F. Roehrig, and S. Mukherjee
1999 Disclosure detection in multivariate categorical databases: Auditing confiden-
tiality protection through two new matrix operators. Management Science
45:1710-1723.
Cox, L.H.
1980 Suppression methodology and statistical disclosure control. Journal of the
American Statistical Association 75:377-385.
1987 A constructive procedure for unbiased controlled rounding. Joumal of the
American Statistical Association 82:38-45.
Dalenius, T.
1986 Finding a needle in a haystack. Joumal of (facial Statistics 2:329-336.
1988 Controlling Invasion of Privacy in Surveys. Department of Development and
Research. Statistics Sweden.
Dalenius, T., and S.P. Reiss
1982 Data-swapping: A technique for disclosure control. Joumal of Statistical Plan-
ning and Inference 6:73-85.
De Waal, A.G., and L.C.R.G. Willenborg
1996 A view on statistical disclosure for microdata. Survey Methodology 22:95-103.
Domingo-Ferrer, J. and V. Torra
2001 A quantitative comparison of disclosure control methods for microdata. Pp.
111-134 in Confidentiality, Disclosure, and Data Access: Theory and Practical
Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and
L.V. Zayatz, eds. Amsterdam: North-Holland/Elsevier.
Duncan, G.T.
2001 Confidentiality and statistical disclosure limitation. In N.J. Smelser and P.B.
Baltes, eds., International Encyclopedia of the Social and Behavioral Sciences.
Oxford, England: Elsevier Science.
Duncan, G.T., and S.E. Fienberg
1999 Obtaining information while preserving privacy: A Markov perturbation
method for tabular data. Eurostat. Statistical Data Protection '98 Lisbon 351-
362.
Duncan, G.T., and S. Kaufman
1996 Who should manage information and privacy conflicts?: Institutional design
for third-party mechanisms. The International Joumal of Conflict Management
7:21 -44.
Duncan, G.T., and D. Lambert
1986 Disclosure-limited data dissemination (with discussion). Joumal of the Amer-
ican Statistical Association 8 1: 10-28.
1989 The risk of disclosure of microdata. Joumal of Business and Economic Statis-
tics 7:207-217.
Duncan, G.T., and S. Mukhe~ee
2000 Optimal disclosure limitation strategy in statistical databases: Deterring tracker
attacks through additive noise. Joumal of the American Statistical Association
95:720-729.
Duncan, G.T., and R. Pearson
1991 Enhancing access to microdata while protecting confidentiality: Prospects for
the future (with discussion). Statistical Science 6:219-239.
Duncan, G.T., S.E. Fienberg, R. Krishnan, R. Padman, and S.F. Roehrig
2001 Disclosure limitation methods and information loss for tabular data. Pp. 135-
166 in Confidentiality, Disclosure, and Data Access: Theory and Practical Ap-
plications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and
L.V. Zayatz, eds. Amsterdam: North-HollandlElsevier.
OCR for page 249
CONFIDENTIALITY AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARDS 249
Duncan, G.T., S. Keller-Mcnulty, and S.L. Stokes
2002 Disclosure risk vs. data utility: The R-U confidentiality map. Technical Re-
ports: Statistical Sciences Group, Los Alamos National Laboratory and Heinz
School of Public Policy and Management, Carnegie Mellon University.
Dunne, T.
2001 Issues in the establishment and management of secure research sites. Pp.
297-314 in Confidentiality, Disclosure, and Data Access: Theory and Practical
Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and
L.V. Zayatz, eds. Amsterdam: North-HollandJElsevier.
Elliot, M.
2001 Disclosure risk assessment. Pp. 135-166 in Confidentiality, Disclosure, and
Data Access: Theory and PracticalApplications for Statistical Agencies, P. Doyle,
J. I. Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds. Amsterdam: North-Holland/
Elsevier.
Elliot, M., and A. Dale
1999 Scenarios of attack: The data intruder's perspective on statistical disclosure
risk. Netherlands Official Statistics 14: 6-10.
Elliot, M., C. Skinner, and A. Dale
1998 Special uniques, random uniques and sticky populations: Some counterin-
tuitive effects of geographical detail on disclosure risk. Research irz Official
Statistics 1:53-68.
Eurostat
1996 Manual on Disclosure Control Methods. Luxembourg: Office for Publications
of the European Communities.
Federal Committee on Statistical Methodology
1994 Statistical Policy Working Paper 22: Report on Statistical Disclosure Limita-
tion Methodology. Washington, DC: U.S. Office of Management and Budget.
Felso, F., J. Theeuwes, and G. Wagner
2001 Disclosure limitation methods in use: Results of a survey. Pp. 17-42 in Conf-
dentiatity, Disclosure, and Data Access: Theory arid Practical Applications for
Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds.
Amsterdam: North-Holland/Elsevier.
Fienberg, S.E.
1994 Conflicts between the needs for access to statistical information and demands
for confidentiality. Journal of Official Statistics 10:115-132.
Fienberg, S.E., U.K. Makov, and R.J. Steele
1998 Disclosure limitation using perturbation and related methods for categorical
data. Journal of Official Statistics 14:347-360.
Fuller, WA.
1993 Masking procedures for microdata disclosure limitation. Journal of Official
Statistics 9:383-406.
Greenberg, B.
1990 Disclosure avoidance research at the Census Bureau. Pp. 144-166 in Pro-
ceedirtgs of the U.S. Census Bureau Annual Research Conference, Washington,
DC.
Greenberg, B. and L. Zayatz
1992 Strategies for measuring risk in public use microdata files. Statistica Need
landica 46:33-48.
Health Research Council of New Zealand
1998 Statement. Available: http://www.hrc.govt.nz/genethic.htm.
Jabine, T.B.
1993a Procedures for restricted data access. Journal of Official Statistics 9:537-589.
1993b Statistical disclosure limitation practices of United States statistical agencies.
Journal of Official Statistics 9:427-454.
OCR for page 250
Seastrom, M.M.
250 PROTECTING PARTICIP~TS ED FACILITATING SOCIAL ED BEHAVIOR SCIENCES SEARCH
Kelley, J., B. Golden, and A. Assad
1990 Controlled rounding of tabular data. Operations Research 38:760-772.
Kim, J.J.
1986 A method for limiting disclosure in microdata based on random noise and
transformation. Pp. 370-374 in Proceedings of the Survey Research Methods
Section, American Statistical Association.
Kim, J.J., and W. Winkler
1995 Masking microdata files. In Proceedings of the Section on Survey Research
Methods, American Statistical Association.
Kooiman, P., J. Nobel, and L. Willenborg
1999 Statistical data protection at Statistics Netherlands. Netherlands Of ficial Statis-
tics 14:21-25.
Lambert, D.
1993 Measures of disclosure risk and harm Joumat of Off cial Statistics 9:313-331.
Little, R.J.A.
1993 Statistical analysis of masked data. Joumal of Of ficial Statistics 9:407-426.
Marsh, C., C. Skinner, S. Arber, B. Penhale, S. Openshaw, J. Hobcraft, D. Lievesley, and
N. Walford
1991 The case for samples of anonymized records from the 1991 census. Joumal of
the Royal Statistical Society, Series A 154:305-340.
Marsh, C., A. Dale, and C.J. Skinner
1994 Safe data versus safe settings: Access to microdata from the British Census.
International Statistical Review 62:35-53.
Mokken, R.J., P. Kooiman, J. Pannekoek, and L.C.R.J. Willenborg
1992 Disclosure risks for microdata. Statistica Neerlandica 46:49-67.
Mood, A.M., F.A. Graybill, and D.C. Boes
1963 Introduction to the Theory of Statistics. New York: McGraw-Hill.
Moore, R.A.
1996 Controlled data-swapping techniques for masking public use microdata sets.
Statistical Research Division Report Series, RR 96-04. Washington, DC: U.S.
Bureau of the Census.
National Research Council
1993 Private Lives and Public Policies: Confidentiality and Accessibility of Govern-
ment Statistics. Panel on Confidentiality and Data Access, G.T. Duncan, T.B.
Jabine, and V.A. de Wolf, eds. Committee on National Statistics and Social
Science Research Council. Washington, DC: National Academy Press.
2000 Improving Access to and Con~dentiatity of Research Data: Report of a Workshop
Committee on National Statistics, C. Mackie and N. Bradburn, eds. Washing-
ton, DC: National Academy Press.
Paass, G.
1988 Disclosure risk and disclosure avoidance for microdata. Joumal of Business
and Economic Statistics 6:487-500.
Rubin, D.B.
1993 Satisfying confidentiality constraints through the use of synthetic multiply-
imputed microdata. Journal of Official Statistics 9:461-468.
2001 Licensing. Pp. 279-296 in Confidentiality, Disclosure, and Data Access: Theory
and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M.
Theeuwes, and L.V. Zayatz, eds. Amsterdam: North-tIolland/
Elsevier.
Skinner, C.J.
1990 Statistical Disclosure Issues for Census Microdata. Paper presented at Interna-
tional Symposium on Statistical Disclosure Avoidance, Voorburg, The Nether-
lands, December 13.
OCR for page 251
CONK AND DATA ACCESS ISSUES FOR INSTITUTIONAL REVIEW BOARS 251
Spruill, N.L.
1983 The confidentiality and analytic usefulness of masked business microdata. Pp.
602-607 in Proceedings of the Section on Survey Research Methods, American
Statistical Association.
Sweeney, L.
2001 Information explosion. Pp. 43-74 in Confidentiality, Disclosure, and Data
Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I.
Lane, J.J.M. Theeuwes, and L.V. Zayatz, eds. Amsterdam: North-Holland/
Elsevier.
Willenborg, L., and T. de Waal
1996 Statistical Disclosure Control in Practice. Lecture Notes in Statistics #111.
New York: Springer.
Winkler, WE.
1998 Re-identification methods for evaluating the confidentiality of analytically valid
microdata. Research in Official Statistics 1:87-104.
Zayatz, L.V., P. Massell, and P. Steel
1999 Disclosure limitation practices and research at the U.S. Census Bureau. Net1~er-
lands Uncial Statistics 14:26-29.
OCR for page 252
252 PROTECTING PARTICIPANTS ED FACILITATING SOCIAL ED BEHAVIORAL SCIENCES SEARCH
The brief interview covered general attitudes toward mecTical re-
search; unclerstancling of such terms as clinical trial and medical
experiment; beliefs about research participation; reasons for par-
ticipating or not participating in research (when applicable); and
demographic and other background information. The overall re-
sponse rate was 95 percent.
Nearly 40 percent of patients hac! been research participants or
invited to be participants. The attitudes of these patients were gen-
erally favorable to research; most felt free to clecline or to leave the
project.
Bell, James, John Whiton, and Sharon Connelly, 1998, "Evaluation
of NIH Implementation of Section 491 of the Public Health Service
Act, Mandating a Program of Protection for Research Subjects"
This is the most recent major stucly of IRBs. The study universe
was defined as 491 IRBs that in 1995 operates! with multiple project
assurances under 45 CFR 46 and that tract conducted! more than 10
initial reviews of human participant research protocols in the pre-
vious year. Five groups received questionnaires: IRB chairs and
institution officials at all 491 institutions; IRB administrators at 300
institutions; four investigators at each of the 300 institutions (1,200
investigators); 4 IRB members at each of 160 IRBs (640 members).
Response rates were 80 percent or higher for IRB chairs (394), ad-
ministrators (245), and institutional officials (4001; rates were 68
percent for IRB members (435) and 53 percent for investigators
(632~.
Topics coverer! included:
.,
. -
· Person-time effort (total person-time of all IRB personnel,
chair effort, member effort, administrator effort, institutional
official effort, investigator effort on initial review);
· Effort per review (per initial review, per continuing review);
.
Other information on effort (meeting time per review, dura-
tion of initial review, unimplementecl protocols, multiple IRB
reviews);
· Opinions about burden (overall efficiency, getting into inap-
propriate areas);
Representative terms from entire chapter:
disclosure limitation