Skip to main content

Currently Skimming:

5 Reconciling the Benefits and Risks of Expanded Data Access
Pages 63-84

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 63...
... As noted in Chapter 4, breaches of confidentiality can occur in a variety of ways. The work of this panel has focused primarily on statistical disclosure -- the re-identification of individual respondents (or their attributes)
From page 64...
... Longitudinal surveys that obtain data for analyzing the determinants and consequences of social and economic behaviors have been a major positive development for research and policy in the past 30 years. Linking survey and administrative data can create particularly rich datasets that, in some cases, can substitute for additional surveys, thus reducing respondent burden as well as government costs.
From page 65...
... Records of such requests for confidential data may also be useful to agencies in monitoring confidentiality protection procedures and actual breaches of confidentiality (see "Research on Breaches of Confidentiality," in this chapter)
From page 66...
... Instead, data are made available either in the form of confidential, restricted-access data files or in the form of anonymized data products, including published tables and microdata files. Confidential files delete direct identifiers such as names and addresses but retain the observational structure of the original data and include all of the value added by an agency to generate its published statistics (such as analysis weights, imputation for unit and item nonresponse, data quality edits, geocoding, industry coding, occupation coding)
From page 67...
... For individual variables, cost-benefit modeling might identify specific items that could be moved from restricted access to public-use data without impairing confidentiality protection and, conversely, items that should be moved from publicuse products to restricted access modes. At a broader level, cost-benefit modeling could be used to evaluate the tradeoffs among the various forms of restricted access -- research data centers, remote access, licensing -- as well as among different ways of restricting data (through various masking techniques and various ways of producing synthetic data)
From page 68...
... Although most federal statistical agencies also have outside advisory groups, they rarely focus on data access programs for particular data sets. However, one example of focused user involvement for a statistical agency microdata collection is the Association of Public Data Users Working Group on SIPP Data Products, which was active from 1989 to 1994.
From page 69...
... PUBLIC-USE DATA Improving Quality Public-use files, introduced in the 1960s, are the most widely available form of research data. As described above, information generated from investigating confidential data in a restricted access environment can be used to improve their quality and relevance.
From page 70...
... That work might determine that such disclosure limitation methods as calculating summary measures or average values from confidential data and attaching these measures to other commonly used microdata could result in highly useful, fully protected public-use microdata for research and policy analysis on such important topics as taxation and retirement income security. Currently, studies of the probabilities of disclosure include estimates of the technical possibility of matching public-use survey data with other widely available information, but they do not include estimates of the likelihood that such matching would be attempted.
From page 71...
... Methods of disclosure limitation based on synthetic or virtual data, which are constructed from confidential data through partial or complete multiple imputation techniques, show promise in safeguarding confidentiality and permitting the estimation of complex models; they should continue to be explored as an alternative to other disclosure limitation methods (see, e.g., Abowd and Woodcock, 2001; Doyle et al., 2001; Raghunathan; 2003)
From page 72...
... estimating and improving the utility-disclosure limitation trade offs of alternative disclosure limitation methods, including synthetic data; and (4) developing disclosure limitation methods for establishment data.
From page 73...
... Recommendation 6 To enhance access to public-use files for second ary analysis, we endorse the recommendations of the Panel on Insti tutional Review Boards, Surveys, and Social Science concerning es tablishment of a new system of confidentiality protection for public-use microdata based on existing and new data archives and statistical agencies. Statistical agencies and participating archives would certify that public-use data sets obtained from them were suf ficiently protected against statistical disclosure to be acceptable for secondary analysis, and IRBs would exempt such analyses from re view on the basis of the certification provided.
From page 74...
... FACILITATING ACCESS TO RESEARCH DATA CENTERS One key way to provide researcher access to confidential data is through research data centers (RDCs) , including the eight centers maintained by the Census Bureau and those maintained at the headquarters of
From page 75...
... Yet such stringency arguably does not enhance confidentiality protection nor forward the mission of the Census Bureau to facilitate data use. In particular, the panel concludes that the first criterion has been interpreted in a way that actually impedes furtherance of the agency's mission.
From page 76...
... In 2004, the Census Bureau implemented a continuous review process in which reviews are being conducted on an "on demand basis." Although it is too early to assess the effects of this change, it is an important step in improving access to confidential data under the RDC program. However, the Internal Revenue Service, because of staff limitations, continues to have only three review cycles each year for projects that propose to use data from Social Security earnings records or income tax returns.3 As noted in Chapter 2, the Census Bureau has indicated openness to other ideas for streamlining the application and review process for RDC projects, and some ideas may also be relevant for RDC operations at other agencies.
From page 77...
... In addition, the Census Bureau and other statistical agencies should explore ways to house confidential data from as many agencies as possible in a single supervised location in a number of host institutions in order to add to their value for research use. The 2002 Confidential Information Protection and Statistical Efficiency Act (CIPSEA)
From page 78...
... Recommendation 10 Statistical agencies and other agencies that sponsor data collection should conduct or sponsor research on cost effective means of providing secure access to confidential data by means of a remote access mechanism, consistent with their confiden tiality assurance protocols. LICENSING AGREEMENTS An alternative to research data centers, one that reduces burden to users because it does not require them to travel to a different location, is a licensing agreement.
From page 79...
... For some agencies, such a mechanism may require new legislation. Recommendation 12 Statistical and other agencies that provide data for research should work with data users to develop flexible, consis tent standards for licensing agreements and implementation proce dures for access to confidential data.
From page 80...
... , contributing to violations; however, most if not all of the violations resulted from carelessness or not following proper procedures, rather than from willful misuse of data, and, again, there was no evidence of disclosure of individual data. Although the panel recognizes that broadening access through licensing agreements may increase the risk of disclosure by increasing the number of people with access to confidential data, we believe that the risk is outweighed by the benefits of wider access.
From page 81...
... Recommendation 15 Statistical and funding agencies should support continuing research to monitor the views of data providers and the general public about research risks and benefits, including such top ics as the sensitivity of questions, data sharing for statistical purposes, methods of obtaining consent for survey participation, the importance of privacy and confidentiality, and similar topics. SAFEGUARDING CONFIDENTIALITY: TRAINING, MONITORING, AND EDUCATION So far, we have discussed ways of expanding research access while protecting confidentiality, focusing mainly on risks of statistical disclosure and how to measure and safeguard against them.
From page 82...
... Data collection agencies that have such guidelines and training should regularly review their procedures to ensure that they are up to date and systematically enforced. Recommendation 16 Statistical agencies and survey organizations that collect individually identifiable data should provide written guidelines for confidentiality protection, as well as training in confi dentiality practices and data management that guard against disclo sure, for all staff who work with or have access to such data.
From page 83...
... Such education programs should deal with ethical, legal, and data quality issues, as well as with administrative and technical procedures for confidentiality protection, data security, disclosure limitation, and informed consent. Statistical agencies could make important contributions to the devel
From page 84...
... There is also an important role in education for professional associations, many of which have codes of professional conduct and ethical standards. Such associations as the American Statistical Association, the American Sociological Association, the Population Association of America, the American Economic Association, the American Association for Public Opinion Research, and their counterparts for other disciplines and fields can contribute significantly to the development of strong norms for fair and ethical practices in research and information gathering.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.