5

Informational Risk in the Social and Behavioral Sciences

Since the publication of the Common Rule in 1991, no aspect of human society has changed so dramatically as information and its rapid production, availability, and retention. The amount of information storage has grown at an annual rate of 25 percent, and the technological capacity to process information even more rapidly (Hilbert and Lopez, 2011). There is so much information that is freely and openly available about individuals that informational risk is ubiquitous in society. In many respects, informational risk is an everyday aspect of life in the 21st century, and it has the potential to change the meaning of informed consent.

While the level of risk varies, to some extent risk exists in all forms of information, whether the information is public, whether it is digitized and rapidly generated, whether it is collected for research purposes, whether it is readily identifiable, and whether it is mundane and routine or personal and sensitive. For most social and behavioral research, the primary risk is informational. Thus, this report devotes special attention to informational risk and the different forms of information used, harvested, or collected by investigators as they pertain to the Federal Regulations for the Protection of Human Subjects.

In this chapter, the committee addresses informational risk and data protection as an extension of the Chapter 2 recommendations concerning the newly proposed category of excused research set forth in the Advance Notice of Proposed Rulemaking (ANPRM; 76 Fed. Reg. 44,512). Consistent with the ANPRM and as discussed in Chapter 2, the excused category is intended and particularly well suited for addressing informational risk involved in (a) surveys, questionnaires, or other methods of information



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 109
5 Informational Risk in the Social and Behavioral Sciences Since the publication of the Common Rule in 1991, no aspect of human society has changed so dramatically as information and its rapid produc- tion, availability, and retention. The amount of information storage has grown at an annual rate of 25 percent, and the technological capacity to process information even more rapidly (Hilbert and Lopez, 2011). There is so much information that is freely and openly available about individuals that informational risk is ubiquitous in society. In many respects, informa- tional risk is an everyday aspect of life in the 21st century, and it has the potential to change the meaning of informed consent. While the level of risk varies, to some extent risk exists in all forms of information, whether the information is public, whether it is digitized and rapidly generated, whether it is collected for research purposes, whether it is readily identifiable, and whether it is mundane and routine or personal and sensitive. For most social and behavioral research, the primary risk is informational. Thus, this report devotes special attention to informational risk and the different forms of information used, harvested, or collected by investigators as they pertain to the Federal Regulations for the Protection of Human Subjects. In this chapter, the committee addresses informational risk and data protection as an extension of the Chapter 2 recommendations concerning the newly proposed category of excused research set forth in the Advance Notice of Proposed Rulemaking (ANPRM; 76 Fed. Reg. 44,512). Consis- tent with the ANPRM and as discussed in Chapter 2, the excused category is intended and particularly well suited for addressing informational risk involved in (a) surveys, questionnaires, or other methods of information 109

OCR for page 109
110 PROPOSED REVISIONS TO THE COMMON RULE gathering from individuals or (b) the use of pre-existing research or non- research data that include private information. The new category would cover a large proportion of studies in the social and behavioral sciences in which the research procedures themselves involve informational risk, but where that risk is no more than minimal when appropriate data security and protection plans are in place. Chapter 2 dealt specifically with the definition and characteristics of excused research and with issues related to its registration. This chapter focuses on the required data protection that needs to be calibrated to the type and level of informational risk in order to avoid inadvertent disclosure or to reduce the level of any potential risk to participants to no more than minimal. The issue of data protection spans the spectrum of methods and modes of inquiry in the social and behavioral sciences, whether qualitative or quantitative, longitudinal or experimental, observational or questionnaire- based, or micro- or macro-level or large-scale. With excused research, in- vestigators need to address data protection appropriate to the research and calibrated to informational risk. The consideration of data protection and informational risk draws on expertise within the social and behavioral sciences. These research fields, the federal statistical agencies, and data providers for the social and behavioral sciences have, over decades, pioneered procedures and mechanisms for vetting data as public-use data files and providing ac- cess to restricted data under various data protection plans calibrated to the level of risk. For more than 30 years, the National Research Council (NRC) has issued reports and guidance that take into account changing information-risk circumstances. For example, awareness of the increased capacity to re-identify data has led to a greater emphasis on restricted-use data and the development of procedures for using and protecting such data. Similarly, awareness of the research potential of video observational data in classrooms or other group settings has led to access to such data under restricted-use conditions. There is helpful guidance from federal agencies, in particular from the federal statistical agencies; data providers such as the Inter-university Consortium for Political and Social Research (ICPSR), large-scale multi- investigator data projects, NRC reports, and the scholarly literature (e.g., National Research Council, 2003; O’Rourke et al., 2006) that is instructive on data protection plans and data use agreements. More than 10 years ago, Seastrom (2002) provided an overview of agency-specific features of data use agreements and licenses. Also in 2002, the National Human Research Protections Advisory Committee issued recommendations on confidenti- ality and research data protections that include a compilation of federal

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 111 research confidentiality statutes and codes useful to investigators and their institutions.1 The thrust of the guidance is to seek to maximize use consonant with confidentiality protection of private information. Reports, such as Expand- ing Access to Research Data (National Research Council, 2005) and Put- ting People on the Map (National Research Council, 2007), offer useful roadmaps on mechanisms to protect data and facilitate use. Plans to protect against and minimize inadvertent disclosure or intentional intrusions in- clude institutional as well as technical and statistical approaches. Licensing agreements with strong penalties for infraction, data enclaves, and secure access mechanisms (where data stewards execute the analyses) are typically used when there is strong risk of disclosure. From a technical point of view, data limitation, alteration, and simulation can also be used, although they limit the data that are available for analysis (National Research Council, 2007, Chapter 3). Building on this foundation, the chapter opens with a definition, de- scription, and general discussion of informational risk in research. While agreeing wholeheartedly with the ANPRM desire to reduce the amount of time institutional review boards (IRBs) spend evaluating informational risk, the committee disagrees strongly with the ANPRM view that the Health Insurance Portability and Accountability Act of 1996 (HIPAA) provides an appropriate standard for specifying data protection plans generally or specifically with respect to social and behavioral research. The chapter spe- cifically discusses HIPAA limitations in this context. Data protection issues and mechanisms are also described, and committee recommendations are offered for strengthening data protection. Looking to the future, the committee proposes that the federal govern- ment (specifically, the U.S. Department of Health and Human Services, HHS) take steps to continue to promote institutional and methodological mechanisms that maximize researcher access to data while protecting the confidentiality of data and ensuring informational risk that is no more than minimal. As noted earlier in this chapter and in Chapter 2, the social and behavioral science community and related institutions and federal statistical agencies have played a leadership role in reconciling researcher access to private information with confidentiality protection and risk reduction (see also Levine and Sieber, 2007). However, given rapid developments in data production, dissemination, and use, it would be timely and wise for revi- sions to the Common Rule to be accompanied by investment in some form of organizational or institutional entity dedicated to addressing new types of informational risk and mechanisms of risk reduction. For heuristic pur- poses, the committee outlines one such approach in the form of a national 1  See http://www.hhs.gov/ohrp/archive/nhrpac/documents/nhrpac14.pdf [December 2013].

OCR for page 109
112 PROPOSED REVISIONS TO THE COMMON RULE center with sufficient expertise in data protection to inform investigators, IRBs, and data providers about (a) how to carry out ethically responsible use of private information made possible through new technologies, (b) in- novative use of institutional arrangements and technology for managing informational risk, (c) standard typologies of risk, and (d) standard solu- tions for managing risk that researchers could readily adopt. The chapter also discusses the continued need to facilitate data shar- ing, a longstanding practice in social and behavioral research. This topic is considered here because of the ANPRM proposals on the use of pre-existing research and non-research data, the benefits to human subjects as well as science and society of further analysis of existing information, and the im- portance of data sharing consonant with data protection and minimizing informational risk. Finally, the committee notes that, in the rapidly chang- ing environment of information and information technology, an ongoing research program is needed to ensure that regulation of informational risk continues to be adequate and appropriate. INFORMATIONAL RISK IN RESEARCH Informational risk is the potential for harm from disclosure of infor- mation about an identified individual. For much of social and behavioral research, informational risk is the only or the primary risk, so social and behavioral research is particularly concerned with its management. How- ever, all research on human subjects contains some element of informational risk, as Lowrance (2012) noted. Data sharing, which is common in social and behavioral research and is becoming increasingly common in biomedi- cal research, requires specific plans for managing informational risk. While changing circumstances can create new challenges for managing informa- tional risk, the social and behavioral sciences bring decades of experience and built expertise for doing so effectively (Levine et al., 2011; National Research Council, 2003, 2007, 2010). Like all other types of risk, the central criterion for determining whether the informational risk in research requires IRB review is the benchmark of minimal risk. Understanding this benchmark, and evaluating whether the risk in a particular study or data-sharing activity falls above or below that benchmark, necessitates careful consideration by investigators before they decide whether to classify and register their research as “excused” as set forth in Chapter 2. Minimal risk is conventionally defined as no greater than the risk encountered by the general population in everyday life.2 2  For the current interpretation of “minimal risk” under the Common Rule and the com- mittee’s suggested revised definition, see the section, “Defining Minimal Risk,” in Chapter 3.

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 113 As with any participant risk that occurs in the context of research, the investigators have an ethical obligation to minimize the informational risk needed to achieve the goals of the research, but compromising research goals to reduce risk that is already below minimal is not in the best interests of science or of the human subjects of that research. As discussed in the Chapter 3 section “Calculating the Probability and Magnitude of Harm,” risk in the language of the Federal Regulations for the Protection of Human Subjects is the product of two considerations: probability of an outcome occurring and the magnitude of harm from that outcome. The most relevant harms3 from information disclosure are po- tential economic harms (e.g., loss of job, insurance coverage, or economic assets), social harms (e.g., loss or damage to social relationships such as marriage), or criminal or civil liability (e.g., arrest for illegal behavior). Also, information made known in some contexts can increase the risk of physical harm (e.g., spouse abuse) or psychological harm (e.g., personal information if revealed could trigger depression). The magnitude of harm depends on the type or content of information being collected about par- ticipants in a study. Highly sensitive information, such as illegal activity or HIV status, has greater potential for harm than less sensitive information such as participants’ opinions or hours of work. Currently IRBs have the task of assessing the sensitivity of information and the magnitude of harm. In that task, IRBs vary in their likelihood of overestimating the potential of harm from information (Green et al., 2006). Much more difficult, for IRBs and researchers alike, is determining the probability of disclosure. Disclosure occurs when information about a human subject is available to unauthorized personnel and can be as- sociated with that subject’s identity. There are basically two ways this can happen: either through negligence in protecting identified data or through re-identification of a participant from information in a dataset that presum- ably has been de-identified (also called “secondary disclosure”). The de facto goal of current practice—to maintain the risk of secondary disclosure at near-zero levels—may be a worthwhile aim in some cases, but only as long as it does not produce hyper-regulation in scrutinizing minimal risk research. As noted earlier, the proposed introduction of an excused category aims to insulate research from overestimation of disclosure risk when risk is no more than minimal or may already be at or near a zero level. From a cost-benefit perspective on optimal regulation, current IRB practice over- regulates informational risk. 3  See also the section in Chapter 3 titled “Potential Harms Resulting from Inadequate Con- fidentiality Protections for Social-Behavioral Research.”

OCR for page 109
114 PROPOSED REVISIONS TO THE COMMON RULE BALANCING THE RISKS AND BENEFITS IN RESEARCH The continuing challenge for investigators, IRBs, institutions, and data providers is twofold: (1) how to build adequate data protection plans in an environment where both the nature of private information and the technol- ogy to protect or disclose such information can rapidly change, and (2) how to do so while meeting the twin goals of minimizing individual risk of harm and maximizing research benefit. The former goal requires a deep analysis of the level of granularity of the data in any one dataset or the relation- ships between datasets and the potential for identity disclosure, as well as the strength of the data protection plan and how access will be provided to users under what conditions. Informational risk can be conceptualized as the probability of harm of storing, using, and reporting on research data, multiplied by the magnitude of the harm from unintended release. The measure of harm is not static: there is some evidence that norms associated with informational risk and informed consent are evolving. Nissenbaum (2011, p. 34) notes that it is increasingly difficult for many people to understand where the old norms end and new ones begin because “[d]efault constraints on streams of infor- mation from us and about us seem to respond not to social, ethical, and political logic but to the logic of technical possibility: that is, whatever the Net allows.” And these views are changing rapidly. The sources of the norms, particularly with respect to consent, identifiability, public interest, safeguards, and indeed the very notion of “privacy” that have guided IRB decisions have also changed, not just in this country but in many others (Lowrance, 2012). Research data are less likely than in the past to be a carefully curated dataset produced by a statistical agency or research insti- tute and resulting from careful experimental or longitudinal design. New norms that use different types of controls are evolving (Landwehr, in press; Pentland et al., in press). While federal statistical agencies, data providers, and others who allow use of restricted data have set standards for access and use, there needs to be continuing attention to trends in data protection and disclosure risk over time. Technology has also changed the research risk-and-benefit calculus. In the past, the focus was often on de-identification to avoid the risk, but such an approach is now less likely to preserve the research utility of the data. Norms on identifiers and outliers must be reconsidered if research benefit is to be maximized. Identifiers, or key data elements, now need to be retained in order that data from one data source can be linked to multiple other sources. Data are more likely now to be part of a communally devel- oped data infrastructure or observatory. Identifiers are necessary in order to match with other population datasets and make appropriate statistical inferences. Data on atypical cases need to be preserved. While early social

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 115 and behavioral research focused on describing population characteristics, modern research in the social and behavioral sciences also studies the be- havior of individuals or businesses at the tail ends of the population distri- bution (e.g., health care costs that are disproportionately driven by a small proportion of the population or innovative business activities that result from the creative energies of a few unusual entrepreneurs). As a result, it is much more important to retain data on outliers: standard disclosure limita- tion techniques thus do not always apply. When direct identifiers (name, address, etc.) must be retained for future use, best practice is to maintain them on storage systems that are isolated from the storage systems hold- ing information about the subjects. Protection of direct identifiers can be handled by good data management. There have also been massive changes in the risk of re-identification, given the public datasets that exist to support re-identification and the tools available, both to anonymize and to de-anonymize the data. In addition, the baseline levels of both risk and harm have changed, given the vast amount of information already in the public domain. Determining the risk level of the data becomes harder in this environment, and experts are needed to understand the risk of harm from a given dataset. Re-identification in turn depends on the subject, the level of detail, the type of media, and the availability of possible match factors. None of these elements is static, and fundamental challenges will be faced in getting the calculus right. If IRBs are too cautious, they risk suppressing valuable social and behavioral research.4 If they are not cautious enough, they risk harming individuals. The benefit of understanding social and behavioral science trends over time must be balanced with the need to protect personal data. Informational risk will continue to increase. The volume and type of data used for social and behavioral research will introduce many new types of identifying elements; the potential for re-identification will increase with more and better types of matching tools and algorithms. Fortunately, the very same technological change that has led to increased potential for loss of confidentiality and other harms has also led to enormous advances in the tools available to protect confidentiality. For IRBs to meet the goal of enabling valuable social and behavioral research, a more flexible system must be developed that better measures and minimizes informational risk. 4  The social benefit from using the data must be a consideration. Since the tragic events of September 11, 2001, for example, the need for behavioral research to understand the human characteristics and dynamics in extremism has grown significantly (see, for example, Atran, 2003).

OCR for page 109
116 PROPOSED REVISIONS TO THE COMMON RULE WHY NOT HIPAA AS THE MANDATED DATA SECURITY AND INFORMATION PROTECTION STANDARD? As stated above, the best way to protect human subjects while mini- mizing the regulatory burden on IRBs and researchers is through adequate protection against disclosure. Matching levels of risk to levels of protection simplifies regulation and allows for clearer communication to participants about the actual level of risk. The ANPRM proposes that elements of the HIPAA Privacy Rule be adopted as the mandated data security and infor- mation protection standard for all research data.5 As argued below, a single standard based on HIPAA is not a workable solution. The ANPRM inquires if study subjects would be sufficiently protected from informational risks if investigators were required to adhere to a strict set of data security and information protection standards modeled on the Privacy Rule and Security Rule elements of HIPAA. The guidance offered by HIPAA is neither necessary nor sufficient for several reasons: the dis- connect between the two rules, the failure to quantify risk, the failure to take into account the research value of data elements, and the focus on individual rather than group risk. These reasons are explained in the next two sections. Disconnect Between the Privacy Rule and Security Rule The disconnect between the two HIPAA rules stems from the fact that the Security Rule does not provide guidance on how to protect information in a manner that is proportional to its risk of disclosure. It only identifies mechanisms that can either be enacted or not enacted. Although it might be anticipated that information security requirements from the Security Rule could be combined with confidentiality requirements from the HIPAA Privacy Rule, this is problematic because the HIPAA Privacy Rule was not designed as a flexible confidentiality protection framework. In addition, the Security Rule provides relevant guidance regarding how an information se- curity framework can be constructed, but it has little focus on maintaining the confidentiality of the information beyond limiting access to authorized users. This is an important principle of data protection, but it is not suf- ficient for mitigating informational risk. In particular, the HIPAA Security Rule focuses on administrative, physical, and technical mechanisms in order to prevent the misuse of in- formation in transmission or inappropriate access to data residing on a computer’s hard drive. Within these mechanisms, it enumerates specific controls (e.g., unique log-ins for users of data), which are either “required” 5  76 Fed. Reg. 44,525.

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 117 or “addressable.” In the case that they are addressable, the organization (or researcher) managing the data must provide documentation regarding why the choice was made not to implement the control in question. Failure to Quantify Risk, Failure to Account for Value of Research, and Failure to Consider Group versus Individual Risk The failure of the HIPAA Privacy Rule to protect social and behavioral research data stems from its approach. It states that data derived from participants can be studied in one of three ways. 1. Information can be used in an identifiable form if it has already been collected (or is “on the shelf”) and it is impracticable to obtain consent. In such a case, the requirement for consent can be waived and data that contain explicit identifiers (e.g., personal name) can be used for research, provided appropriate protection mechanisms (such as those specified in the HIPAA Security Rule) are set in place. 2. Less oversight is necessary if data are disclosed as a “limited da- taset.” In this case, the data must be stripped of 16 enumerated features associated with the participant, such as Social Security numbers, telephone numbers, and specific residential addresses. In addition, the recipient of the limited dataset and the organization sharing the data must enter into a binding contract that prohibits the recipient from attempting re-identification of the records and uses of the data outside of the reasons specified in the contract. This approach to data protection is clearly less risky than using fully identified data under a waiver of consent, but the enumerated list is a heuristic that provides little quantification of the actual risk. Benitez and Malin (2011) have shown that application of such a policy leads to variable risk, depending on the region of the country from which the research participants come. 3. If a dataset is de-identified, then it is no longer covered by HIPAA. This occurs when “it does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individu- ally identifiable health information” (45 C.F.R. § 164.514). The Privacy Rule provides several ways in which de-identification can be achieved. The first is an extension of the limited dataset from 16 to 18 identifiers, plus an attestation that the provider of the data “does not have actual knowledge that the information could be used alone or in combination with other information to iden- tify an individual who is a subject of the information” (45 C.F.R.

OCR for page 109
118 PROPOSED REVISIONS TO THE COMMON RULE § 164.514). This strategy does have less risk than a limited dataset, but it, too, suffers from the fact that its guidelines are independent of the actual data and do not provide an actual quantification of risk. The 18 enumerated features are common to medical records, which HIPAA was designed to regulate, but do not include other potentially identifying data elements that might be present in social and behavioral research data. Conversely, the presence of one or even several of the enumerated elements in isolation from the oth- ers may not lead to any significant risk of re-identification in, for example, large population-based samples. Alternatively, the HIPAA Privacy Rule states that de-identification can be achieved when “[a] person with appropriate knowledge of and experi- ence with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable: i. Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combina- tion with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the informa- tion; and ii. Documents the methods and results of the analysis that justify such determination.” This mechanism is noteworthy in that it requires actual quantification of risk. There are various ways in which such risk can be measured; how- ever, despite the specification of such an option, there are several concerns. First, the de-identification standard is an either/or policy. Either the dataset is not protected because it is de-identified or it is protected because it is identifiable. Thus, there is no quantification of risk beyond this binary level of protection. Second, the HIPAA de-identification policy does not relate confidential- ity to the utility of the data. In other words, the priority is put on privacy and not on the balance between the need to protect the data and the need to learn from the data via worthwhile scientific endeavors. Third, the HIPAA de-identification model provides an emphasis on individual identification and does not address issues associated with group- based risks or the publication of aggregated summary statistics associated with the data. Based on these arguments the committee concludes that HIPAA would not be the most suitable standard for the protection of many types of re- search, including research in the social and behavioral sciences.

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 119 Recommendation 5.1: HHS should not mandate HIPAA as the stan- dard for data security and information protection. In recommending that HIPAA not be mandated as the data protection and security standard, the committee is not suggesting that another par- ticular set of standards be mandated for social and behavioral sciences but rather that there be an array of data protection approaches that best fit the data protection needs. These can include • planning data protection with the concept of a portfolio approach considering safe people, safe projects, safe data, safe settings, and safe outputs; • utilizing a wide range of statistical methods to reduce risk of disclosure; • consulting resources and data protection models to help research- ers and IRBs such as university research data management service groups, individual IT/protection experts, and specialized institu- tions such as the ICPSR and NORC at the University of Chicago; • existing standards for data protection promulgated by the National Institute of Standards and Technology (NIST); and • developing a future national center to define and certify the levels of information risk of different types of studies and corresponding data protection plans to ensure risks are minimized. These approaches will be discussed in more detail in the next sections. DATA PROTECTION PLANS— CURRENT AND FUTURE GUIDANCE Once the risk profile is determined, the next step is to define a data protection plan that can address the needed risk in the research. The chang- ing technological environment discussed above means that researchers and IRBs need to have a current and reliable source from which they can deter- mine what reasonable measures can be taken that protect confidentiality and that are less reliant on solely statistical approaches. Data protection plans should use a diversified approach to minimize disclosure risk: safe projects (valid research aims), safe people (trusted researchers), safe data (data treated to reduce disclosure risk), safe settings (physical and techni- cal controls on access), and safe outputs (reviewing products for disclosure risk) (Ritchie, 2009). Yet, the same changing technology that has made it much more difficult for individual investigators and IRBs to know how to ensure such safe use has also made it possible to identify new types of controls.

OCR for page 109
124 PROPOSED REVISIONS TO THE COMMON RULE remotely.”14 Rich frontier and practical knowledge has been developed at the Human Dynamics and Media Labs of the Massachusetts Institute of Technology,15 as well as at Microsoft Research.16 However, these special- ized organizations are not mandated to provide guidance to IRBs, nor do they likely have the support staff to do so at their current configuration of resources. This rich capacity within the United States, as well as in other coun- tries, suggests the value of a dedicated entity that could lead, coordinate, and build upon the depth of knowledge and experience that exists; keep pace with data and technological innovations; and foster research. One attractive option worthy of consideration is to establish a national center of expertise in research data protection technologies. This center could be charged with providing operational guidance to investigators, institutions, or IRBs, derived from interactions among commercial, academic, and gov- ernment experts. Such a center could have the following features: • Authority. The center could be authorized by HHS to carry out the activities identified in Recommendation 5.2. It could serve as a re- source to support improvements in enhancing data protection and addressing informational risk under varying conditions. It could use its convening authority to bring together broad-based experts. Also, it could serve as a catalyst for research. • Staffing. The center could employ a research staff to ensure that changes in technology are readily acknowledged and researched. • Expertise. The center could be charged with identifying experts who could certify both established and frontier approaches used by research organizations to protect different types of research data and with providing guidance about the advantages and disadvan- tages of both. • Products. The center could be responsible for producing three key products: (1) current guidance about the characteristics of datasets that could be used to create discrete informational risk profiles, conditional on different levels of research utility; (2) a menu of certified data protection plans that would be appropriate to use for each of the risk levels and that researchers and IRBs can use in their work; and (3) a set of recommendations for limiting disclosure when publishing results. • Dissemination. The center could be responsible for maintaining a constantly updated website for IRBs and researchers to use that 14  See http://www.dataenclave.org/index.php/data-enclave [December 2013]. 15  See http://hd.media.mit.edu/ [December 2013]. 16  See http://research.microsoft.com/apps/pubs/default.aspx?id=80239 [December 2013].

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 125 characterizes the informational risk profiles of different types of datasets, matches data protection plans to those risk profiles, and provides guidance to IRBs in determining informational risk. FACILITATING DATA SHARING AND USE Data sharing has been referenced in some of the discussion above con- cerning data protection, but in this final section the committee discusses specific needs to foster and guide data sharing and responsible use, which is a longstanding practice in social and behavioral research (Levine and Sieber, 2007; National Research Council, 1985). Implicit in encouraging data shar- ing includes encouraging agencies, organizations, and institutions to make accessible administrative records consonant with confidentiality agreements (see, e.g., National Research Council, 2005, 2007). Data sharing is a highly desirable component of an open and democratic scientific community. It allows verification through replication of findings of the original investiga- tors; it permits novel investigations by researchers with hypotheses different from those of the original investigators; it creates research opportunities for students and junior investigators without resources for large original data collections. It is increasingly required by federal funding agencies as a condition of research awards.17 Many investigators have neither the expertise nor the continuity of funding to sustain the effort of making data available, particularly if re- stricted-access arrangements are needed. Data archiving organizations can play a valuable role in promoting data sharing. Their roles could be en- hanced if there were credentialing procedures or other guidance to help in- vestigators make appropriate choices among data-archiving organizations. Guidance Recommended: OHRP should facilitate data sharing by is- suing a list of participating and approved data archives that have been reviewed by an OHRP expert panel as having (a) the technical expertise to provide public-use data files and restricted-access data files and (b) the procedures in place for review of such data. Investigators obtaining data from participating archives must adhere to guidelines for public- use data files and to data use agreements in the case of restricted-use data. Adherence to these conditions is essential for classifying inves- tigator use of public-use files as not human-subjects research and of restricted-use data as excused. 17  See the Memorandum for the Heads of Executive Departments and Agencies on Increas- ing Access to the Results of Federally Funded Scientific Research from the Executive Office of the President, Office of Science and Technology Policy, February 22, 2013, at http://www. whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf.

OCR for page 109
126 PROPOSED REVISIONS TO THE COMMON RULE Researchers using secondary data are still bound by ethical obligation to protect the privacy of human subjects, whether or not data providers make explicit such conditions. Attempts to identify human subjects in sec- ondary data or to describe to others methods for doing so should be consid- ered research misconduct and punished appropriately. The only exception is analysis of disclosure risk when it is authorized by the data provider. Recommendation 5.3: As a condition of undertaking secondary re- search on public-use or restricted-access data, investigators have the responsibility to protect the confidentiality of the data and honor the data protection plan and other agreements with the data provider, whether the data provider is the primary researchers involved in the study, an agency or institution, or a data distribution organization. The revised regulations and OHRP guidance on data use should make clear that secondary users must honor confidentiality agreements but that no further consent from human subjects is needed to use such data. The revised regulations should also make clear that data providers may share data without consent of human subjects as long as users adhere to the original confidentiality agreements and other conditions of use. Guidance Recommended: OHRP should clarify that the determina- tion of whether research data collected from human subjects can be distributed to other researchers through public-use or restricted-access agreements should be made by (a) the investigators who collected the data or (b) a data distribution organization delegated by the original investigators and approved by the IRB as the distributing organization. As set forth in Chapter 2, research on public-use data files is not human-subjects research and outside of the Federal Regulations for the Protection of Human Subjects. Those preparing such data for public-use need to ensure that the data have been de-identified and that risk of re- identification is at or approximates zero. In certifying data for public use, IRBs make a judgment on this classification based on this defining charac- teristic of public-use data. Research data are not appropriate for public use when they involve in- formational risk that is potentially more than minimal because they include (a) highly sensitive, private information that could lead to civil or criminal liability or economic, social, or psychological harm or (b) information that could increase the likelihood of re-identification. High standards for de- identification and stringent data disclosure tests may reduce informational risk, though certain variables may need to be excluded from public-use data files. Alternatively, when such data have scientific value such that making them available for research purposes is desirable, there are a number of

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 127 possible mechanisms to reduce informational risk while allowing research access. As discussed in the context of data protection plans, these mecha- nisms include licensing agreements and the use of secure enclaves. Restricted-use data are data about human subjects that retain or in- clude potentially identifiable information and so require special data protec- tion plans to protect against disclosure. In general, the option of combining restricted-use data with public-use data is an expected part of a data- sharing system and should be accounted for in the data protection plan for the restricted-use data. Addition of new public data in a research activity should be registered but does not require additional review. Combining multiple types of restricted-use data may significantly increase informa- tional risk and so requires approval of the data provider and registration of a new data protection plan. Guidance Recommended: OHRP guidance should clarify that inves- tigators with access to restricted-use data or datasets must have the approval of the data provider to integrate additional restricted-use data. Under such circumstances, the guidance should cover the follow- ing situations: (1) Investigators must obtain approval and modify as necessary their data protection plan to account for additional use of restricted data. Such additional study remains excused but must be reg- istered with an updated data protection plan. (2) Under circumstances where investigators have access to restricted-use data and are enhancing these data with publicly available information, they may do so without the approval of the data provider as long as a new data protection plan is registered that accounts for the use of additional public information. Data linkage is a powerful tool for increasing the scientific value of data collected from human subjects. Opportunities for linkage may arise after contact with human subjects has ceased. Many sources of linked data, such as government administrative records, can only be obtained with consent of the individual whose records are sought. The Common Rule should not impose or encourage such a requirement where it does not exist. Rather, it should in all cases regulate the protection of data so that informational risk from data linkage is managed appropriately. The specification of an appropriate arrangement is the responsibility of the data provider and the associated IRB. Researchers gaining access to restricted-use data through these arrangements, and their institutions, accept responsibility to protect the data. Conditions often include stiff pen- alties for violations; for the NCES, violations are a class E felony subject to up to 5 years in prison and/or up to $250,000 in penalties. The terms of the agreements should not in general require review by the IRB of the

OCR for page 109
128 PROPOSED REVISIONS TO THE COMMON RULE recipient. Secondary use of restricted-access data, however, should be reg- istered as excused. Guidance Recommended: OHRP should issue guidance that inves- tigators with access to restricted-use data through site licenses, data enclaves, or other mechanisms operated by government agencies and other data providers are excused from IRB review. They are, how- ever, responsible for registering their research at their own institution, including filing the approval for use of such data and the conditions under which they have obtained access. Recommendation 5.4: If investigators collected data from human sub- jects (i.e., primary data collection), their additional consent is not necessary to subsequently link to other pre-existing data, except under circumstances where human subjects are being asked to participate further in the research or if their original consent prohibited future data linkage. The fact that additional consent is not required to link data does not reduce the responsibility of investigators to modify and register their data protection plans. Recommendation 5.5: Investigators using non-research private infor- mation (e.g., student school or health records) need to adhere to the conditions for use set forth by the information provider and prepare a data protection plan consonant with these conditions, calibrated to the level of risk, and sufficient to reduce risk through disclosure. Further consent is not required from such individuals as long as investigators pledge to adhere to confidentiality agreements. Finally, the committee concludes that, in the rapidly changing environ- ment of information and information technology, an ongoing research pro- gram is needed to ensure that regulation of informational risk is adequate and appropriate. The following research recommendation is consistent with that of several important NRC reports released over the past 10 years. Research Needed: (1) Research is needed on innovations in the data use of non-research information and records, new ways of collecting and linking data, and new methods for measuring and quantifying risk and risk reduction techniques. (2) Since it is increasingly unknowable whether existing disclosure limitation mechanisms sufficiently balance disclosure risks and the utility inherent in social and behavioral research datasets, the committee recommends that (a) disclosure limitations mechanisms be tested against social and behavioral research datasets to identify methods that are appropriate to develop best practices, and

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 129 (b) information-disclosure risk assessment and risk mitigation strategies should be developed that are consistent with the nature of social and behavioral research datasets. DATA PROTECTION FOR PRIMARY AND SECONDARY QUALITATIVE DATA While the earlier sections of this chapter often had as reference points quantitative, large-scale data surveys and administrative records, the recom- mendations apply appropriately to all forms of data. Qualitative studies, including ethnographic methods and in-depth observational projects, are also amenable to sharing with high standards for protection to ensure that the data are not identifiable. Therefore, a separate section is devoted here to protecting qualitative data because the nature of the interaction between researchers and participants, the data collection process, and the resulting data are substantially different when using qualitative methods than in quantitative studies. Using the example of fieldwork, sociocultural anthro- pologists, ethnographic sociologists, religion scholars, market research- ers, and many others employ fieldwork, each in slightly different ways. Fieldwork most generally refers to data collection taking place outside of specialized, researcher-controlled settings or contexts (e.g., a laboratory or survey questionnaire). It can entail everything from observation of rural villagers with little social interaction between a researcher and research participants, through short-term, “participatory-action” research involving a collaboration between researcher and an urban community in solving a social problem, to long-term, discovery-oriented “participant observation” during which the researcher becomes closely involved with a community or organization and research objectives shift in response to new informa- tion. We discuss below issues related to protecting qualitative data and approaches for ensuring that private information acquired is secure. Protection for Primary Data As part of their professional ethics in protecting research participants, fieldworkers and other qualitative researchers are trained to keep their notes and recordings secure. They have an ethical obligation to keep con- fidences not just in note taking but also in their social interactions. When it comes to ethnographic field materials (e.g., field notebooks and other notes based on participant observation and interviewing; recordings and transcripts; personal materials collected from informants, such as letters, drawings, and so on; photographs, whether created for personal or research reasons; and similar materials created by the ethnographer or given to the ethnographer by persons with whom she or he has a field relationship), data

OCR for page 109
130 PROPOSED REVISIONS TO THE COMMON RULE need to be protected through secure storage by the researcher: Examples of secured storage include locked office file boxes to which only the researcher has access, password-protected computers, and locked thumb drives. Over the past 40 years, the American Anthropological Association has developed a diverse set of case materials and references to an expanding published case literature on ethics and data protection.18 More recently, the American Sociological Association has made extensive case materials available on its website,19 and other professional associations are doing likewise. Protection for Data Sharing Qualitative research poses major challenges for privacy protection and data sharing: this is important to recognize, particularly in light of funders’ relatively new data sharing advisories and requirements. Irwin (2013, p. 297) points out that making qualitative data available for sec- ondary analyses is not feasible for many types of ethnographic and field studies because it is not possible to cleanse field notes and other research materials “of the contextual, conceptual, and interactional context in which they were produced and through which they could be understood.” In these cases, research materials are securely curated by the researcher for personal use; upon the death of the researcher, in some cases these materials are archived in repositories having extensive experience curating context-rich documents (e.g., the Smithsonian Institution Archives20). However, qualita- tive data resulting from formal and some kinds of semi-structured inter- views, or research questions whose answers do not depend on context-rich information and extensive social interaction between the researcher and the respondent, could have more value for secondary analyses by third parties (Irwin, 2013). In view of new funder requirements to make qualitative data available for secondary analyses, Parry and Mauthner (2004) describe another set of issues that make archiving and reusing qualitative data more challenging than they are for quantitative data. In some cases when copyright, or own- ership, of data is transferred to archives, both respondents and research- ers lose control of deposited data. This loss is particularly meaningful for qualitative data, which are inherently more personal, in-depth, and devel- opmental. Even when respondent data appear to be anonymized, in some qualitative studies confidentiality may not be achievable because of very small numbers of participants and distinctive community circumstances inextricable from the central research questions. In such cases, removing or 18  See http://www.aaanet.org/cmtes/ethics/Ethics-Resources.cfm [December 2013]. 19  The website is at http://www.asanet.org/ethics/ethics.cfm [December 2013]. 20  See http://siarchives.si.edu/ [December 2013].

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 131 masking demographic variables and geographical information may change the meaning of the data or limit their utility. Given these and other related challenges, Parry and Mauthner (2004) urge special provisions for protect- ing qualitative data. While ICPSR is known more for archiving quantitative data, they also archive qualitative datasets.21 In archiving qualitative data, ICPSR instructs researchers to follow guidelines on its webpages, which instruct researchers about how to keep data confidential through replacing names with gener- alized text, replacing dates, and removing unique or publicized items.22 However, this advice reflects ICPSR’s central interest and experience with quantitative datasets and, as suggested above, may not be appropriate for many qualitative materials. The ICPSR website also refers to an archive in the United Kingdom that is specifically dedicated to archiving qualitative data and works with social scientists in developing protection methods that fit these challenging data (Corti et al., 2000). Researchers and regulators need to be aware that there are many other repositories with decades of experience handling qualitative research data, both specialized (e.g., the University of California’s Melanesian Archives23) and general (e.g., the National Archives24), not to mention the special col- lections held by the libraries of research universities. These repositories contain collections serving humanities disciplines such as history and are appropriate for the long-term management of the research materials gener- ated by qualitative social research using interpretive methods. REFERENCES Atran, S. (2003). Genesis of suicide terrorism. Science, 299(5612):1534-1539. Benitez, K., and Malin, B. (2011). Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association, 17(2):169-177. Corti, L., Day, A., and Backhouse, G. (2000). Confidentiality and informed consent: Issues for consideration in the preservation and provision of access to qualitative data archives. Forum: Qualitative Social Research, 1(3). Available: http://www.qualitative-research.net/ index.php/fqs/article/view/1024/2207 [December 2013]. Ember, C.R., and Hanisch, R.J. (2013). Sustaining Domain Repositories for Digital Data. Available: http://datacommunity.icpsr.umich.edu/sites/default/files/WhitePaper_ICPSR_ SDRDD_121113.pdf [December 2013]. 21  Observational video data are archived at the ICPSR, along with quantitative measures as part of the Measures of Effective Teaching (MET) Project Longitudinal Database. See: http:// www.icpsr.umich.edu/icpsrweb/content/METLDB/about/index.html [December 2013]. 22  See http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3qual.html [De- cember 2013]. 23  See http://libraries.ucsd.edu/locations/sshl/resources/featured-collections/melanesian-studies/ [December 2013]. 24  See http://www.archives.gov/ [December 2013].

OCR for page 109
132 PROPOSED REVISIONS TO THE COMMON RULE Green, L.A., Lowery, J.C. .Kowalski, C.P., and Wyszewianski, L. (2006). Impact of institu- tional review board practice variation on observational health services research. Health Services Research, 41(1):214-230. Hilbert, M., and Lopez, P. (2011). The world’s technological capacity to store, communicate, and compute information. Science, 332(6025):60-65. Irwin, S. (2013). Qualitative secondary data analysis: Ethics, epistemology, and context. Prog- ress in Development Studies, 13(4):295-306. Landwehr, C. (in press). The operational framework: Engineered controls. In J. Lane, V. Stodden, H. Nissenbaum, and S. Bender (Eds.), Privacy, Big Data and the Public Good. Cambridge, UK: Cambridge University Press. Levine, F., and Sieber, J. (2007). Ethical issues related to linked social-spatial data. Appendix B. In National Research Council, M.P. Gutmann and P.C. Stern (Eds.), Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data. Washington, DC: The National Academies Press. Levine, F.J., Lempert, R.O., and Skedsvold, P.R. (2011). Social and Behavioral Sciences White Paper on Advanced Notice of Proposed Rulemaking (ANPRM). Available: http://www. aera.net/Portals/38/docs/Education_Research_and_Research_Policy/SBS%20White%20 Paper%20Report%20Final10-26-11.pdf [December 2013]. Lowrance, W.W. (2012). Privacy, Confidentiality, and Health Research. Cambridge, UK: Cambridge University Press. National Institute of Standards and Technology. (2011) Electronic Authentication Guideline: Information Security. Gaithersburg, MD: U.S. Department of Commerce. Available: http://csrc.nist.gov/publications/nistpubs/800-63-1/SP-800-63-1.pdf [November 2013]. National Research Council. (1985). Sharing Research Data. Committee on National Statistics. S.E. Feinberg, M.E. Martin, and M.L. Straf (Eds.). Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council. (2003). Protecting Participants and Facilitating Social and Be- havioral Sciences Research. Panel on Institutional Review Boards, Surveys, and Social Science Research. C.F. Citro, D.R. Ilgen, and C.B. Marrett (Eds.). Committee on National Statistics and Board on Behavioral, Cognitive, and Sensory Sciences. Washington, DC: The National Academies Press. National Research Council. (2005). Expanding Access to Research Data: Reconciling Risks and Opportunities. Panel on Data Access for Research Purposes. Washington, DC: The National Academies Press. National Research Council. (2007). Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data. Panel on Confidentiality Issues Arising from the Integra- tion of Remotely Sensed and Self-Identifying Data. M.P. Gutmann and P.C. Stern (Eds.). Committee on the Human Dimensions of Global Change. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. National Research Council. (2010). Conducting Biosocial Surveys: Collecting, Storing, Access- ing, and Protecting Biospecimens and Biodata. Panel on Collecting, Storing, Accessing, and Protecting Biological Specimens and Biodata in Social Surveys. R.M. Hauser, M. Weinstein, R. Pool, and B. Cohen (Eds.). Committee on National Statistics and Commit- tee on Population, Division of Behavioral and Social Sciences and Education. Washing- ton, DC: The National Academies Press. Nissenbaum, H. (2011). A contextual approach to privacy online. Daedalus, the Journal of the American Academy of Arts & Sciences, 140(4):32-48. O’Rourke, J.M., Roehrig, S., Heeringa, S., Reed, B.G., Birdsall, W.C., Overcashier, M., and Zidar, K. (2006). Solving problems of disclosure risk while retaining key analytic uses of publicly released microdata. Journal of Empirical Research on Human Research Ethics, 1(3):63-84.

OCR for page 109
INFORMATIONAL RISK IN THE SOCIAL AND BEHAVIORAL SCIENCES 133 Parry, O., and Mauthner, N.S. (2004). Whose data are they anyway?: Practical, legal, and ethical issues in archiving qualitative research data. Sociology, 38(1):139-152. Pentland, A., Greenwood, D., Sweatt, B., Stopczynski, A., and de Montjoye, Y.-A. (in press). The operational framework: Institutional controls. In J. Lane, V. Stodden, H. Nissenbaum, and S. Bender (Eds.), Privacy, Big Data and the Public Good. Cambridge, UK: Cambridge University Press. Ritchie, F. (2009). Designing a National Model for Data Access. Paper presented at the Com- parative Analysis of Enterprise (Micro) Data Conference, Tokyo, Japan. Available: http:// gcoe.ier.hit-u.ac.jp/CAED/papers/id213_Ritchie.pdf [December 2013]. Seastrom, M.M. (2002). Licensing. Pp. 279-296 in P. Doyle, J. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Ap- plications for Statistical Agencies. Amsterdam, North Holland: Elsevier.

OCR for page 109