National Academies Press: OpenBook
« Previous: Front Matter
Suggested Citation:"1 Introduction." National Research Council. 2000. Improving Access to and Confidentiality of Research Data: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9958.
×

1

Introduction

In October 1999, the Committee on National Statistics (CNSTAT), in consultation with the Institute of Medicine, convened a 2-day workshop to identify ways of advancing the often conflicting goals of exploiting the research potential of microdata and preserving confidentiality. The emphasis of the workshop was on longitudinal data that are linked to administrative records; such data are essential to a broad range of research efforts, but can also be vulnerable to disclosure. Administrative data are collected to carry out agency missions and constitute the majority of agency data. An additional—much smaller—amount of data is collected specifically for research and other public purposes. It is sometimes feasible and useful to merge the latter data with the more extensive administrative records.

CNSTAT has had an active history working in the area of data confidentiality and access, culminating with the panel study that produced the volume Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics (National Research Council and Social Science Research Council, 1993). That study resulted in a series of recommendations for advancing researchers' access to data without compromising the ability to protect the confidentiality of survey respondents. This workshop brought together several participants from that study and many others representing various communities—data producers from federal agencies and research organizations; data users, including academic researchers; and experts in statistical disclosure limitation techniques, confidentiality policies, and administrative and legal procedures.

Suggested Citation:"1 Introduction." National Research Council. 2000. Improving Access to and Confidentiality of Research Data: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9958.
×

KEY ISSUES

The development of longitudinal data sets linked to health, economic, contextual geographic, and employer information has created unique and growing research opportunities. However, the proliferation of linked data has simultaneously produced a complex set of challenges that must be met to preserve the confidentiality of information provided by survey respondents and citizens whose administrative records are entrusted to the government. Unprecedented demand for household-and individual-level data, along with the continuing rapid development of information technology, has drawn increasing attention to these issues. Technological advances have rapidly improved the range and depth of data; opportunities to access, analyze, and protect data have grown as well. However, technology has concurrently created new methods for identifying individuals from available information, of which longitudinal research data are but one of many sources.

Longitudinal files that link survey, administrative, and contextual data provide exceptionally rich sources of information for researchers working in the areas of health care, education, and economic policy. To construct such files, substantial resources must be devoted to data acquisition and to the resolution of technical, legal, and ethical issues. In most cases, requirements designed to protect confidentiality rule out the type of universal, unrestricted data access that custodians —and certainly users—of such databases may prefer.

Several modes of dissemination are currently used to provide access to information contained in linked longitudinal databases. Dissemination is typically restricted either at the source, at the access point, or both. Products such as aggregated, cross-tabulation tables are published regularly and made available to all users, but of course offer no record-level detail. This type of data does not support research into complex individual behavior. Public-use microdata files, on the other hand, offer detail at the individual or household level and are available with minimal use restrictions. However, producers of microdata must suppress direct identifier fields and use data masking techniques to preserve confidentiality. Additional methods, such as licensing agreements, data centers, and remote and limited access, have been developed to limit either the types of users allowed access to the data, the level of data detail accessible by a given user, or both. Restricted access arrangements are generally designed to provide users with more detail than they would get from a public-use file.

It is within this context that the workshop participants debated the key issues, which can loosely be organized at two levels. The first is the tradeoff that exists between increasing data access on the one hand and improving

Suggested Citation:"1 Introduction." National Research Council. 2000. Improving Access to and Confidentiality of Research Data: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9958.
×

data security and confidentiality on the other.1 To examine this tradeoff, it is necessary to quantify, to the extent possible, disclosure risks and costs, as well as the benefits associated with longitudinal microdata and with linking to administrative records. Decisions about what types of data can be made available, to whom, and by what method hinge on the assessment of these relative costs and benefits. Researchers typically appeal for greater access to unaltered data, while stewards of the data are understandably often more focused on assessing and minimizing disclosure risk.

At the second level of discourse, participants discussed alternative approaches to limiting disclosure risk while facilitating data access. Given that all longitudinal microdata require some protections, the compelling question is which approach best serves data users while maintaining acceptable levels of security. The choice reduces essentially to two options: (1) restricting access—physically limiting who gets to see the data, or (2) altering the data sufficiently to allow for safe broader (public) access. Other elements, such as legal deterrents, also come into play. Workshop participants articulated in detail the merits and relative advantages of alternative approaches. Their arguments are summarized in this report.

WORKSHOP GOALS

As noted above, a central objective of the workshop was to review the benefits and risks associated with public-use research data files and to explore alternative procedures for restricting access to sensitive data, especially longitudinal survey data that have been linked to administrative records. Doing so requires considering the impact on each group involved—survey respondents, data producers, and data users—of measures designed to reduce disclosure risk. Presenters from the academic community reviewed the types of research that are enhanced, or only made possible, by the availability of linked longitudinal data. Participants also identified and suggested methods for improving current practices used by agencies and research organizations for releasing public-use data and for establishing restricted access to nonpublic files. The overarching theme was the importance of advancing methods that maximize the social return on investments in research data, while fully complying with legal and ethical requirements.

1  

Early on in the workshop, a participant clarified the distinction between “privacy” and “confidentiality.” Privacy typically implies the right to be left alone personally, the right not to have property invaded or misused, freedom to act without outside interference, and freedom from intrusion and observation. In the context of research data, confidentiality is more relevant. The term refers to information that is sensitive and should not be released to unauthorized entities. It was suggested that confidentiality implies the need for technical methods of security.

Suggested Citation:"1 Introduction." National Research Council. 2000. Improving Access to and Confidentiality of Research Data: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9958.
×

The workshop, then, was designed with the following goals in mind:

  • To review the types of research that are enhanced, or only made possible, using linked longitudinal data.

  • To review current practices and concerns of federal agencies and other data producing organizations.

  • To provide an overview of administrative arrangements used to preserve confidentiality.

  • To identify ways of fostering data accessibility in secondary analysis.

  • To assess the utility of statistical methods for limiting disclosure risk.

To date, efforts to address these themes have been hindered by inadequate interaction between researchers who use the data and agencies that produce them and regulate their dissemination. Researchers may not understand and may become frustrated by access-inhibiting rules and procedures; on the other hand, agencies and institutional review boards are not fully aware of how statistical disclosure limitation measures impact data users. The workshop brought the two groups together to help overcome these communication barriers.

REPORT ORGANIZATION

Workshop topics were organized into the following sessions: (I) linked longitudinal databases—achievements to date and research applications, (II) legal and ethical requirements for data dissemination, (III) procedures for releasing public-use microdata files, and (IV) procedures for restricted access to research data files. This report is structured slightly differently to focus on themes as they emerged during the workshop. Chapter 2 outlines the tradeoff between data access and confidentiality. Presentations on the research benefits of linked longitudinal data are summarized, along with discussions of disclosure risk assessment and quantification. Chapter 3 reviews presentations that addressed ethical and legal aspects of data dissemination, as well as discussion on the role of institutional review boards. Chapter 4 summarizes participants' assessments of competing approaches to limiting disclosure risk and facilitating user access; the focus is on two primary competing approaches—data perturbation and access limitation. Agency and organization practices are the subject of Chapter 5. In adition, two appendices are provided: Appendix A is a list of the workshop participants; Appendix B is the workshop agenda.

Suggested Citation:"1 Introduction." National Research Council. 2000. Improving Access to and Confidentiality of Research Data: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9958.
×
Page 1
Suggested Citation:"1 Introduction." National Research Council. 2000. Improving Access to and Confidentiality of Research Data: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9958.
×
Page 2
Suggested Citation:"1 Introduction." National Research Council. 2000. Improving Access to and Confidentiality of Research Data: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9958.
×
Page 3
Suggested Citation:"1 Introduction." National Research Council. 2000. Improving Access to and Confidentiality of Research Data: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9958.
×
Page 4
Next: 2 The Data Access, Confidentiality Tradeoff »
Improving Access to and Confidentiality of Research Data: Report of a Workshop Get This Book
×
Buy Paperback | $47.00 Buy Ebook | $37.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Improving Access to and Confidentiality of Research Data summarizes a workshop convened by the Committee on National Statistics (CNSTAT) to promote discussion about methods for advancing the often conflicting goals of exploiting the research potential of microdata and maintaining acceptable levels of confidentiality. This report outlines essential themes of the access versus confidentiality debate that emerged during the workshop. Among these themes are the tradeoffs and tensions between the needs of researchers and other data users on the one hand and confidentiality requirements on the other; the relative advantages and costs of data perturbation techniques (applied to facilitate public release) versus restricted access as tools for improving security; and the need to quantify disclosure risks—both absolute and relative—created by researchers and research data, as well as by other data users and other types of data.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!