Read "Protecting Student Records and Facilitating Education Research: A Workshop Summary" at NAP.edu

« Previous: 4 Reconciling the Access, Privacy, and Confidentiality of Education Data

Page 51 Cite

Suggested Citation:"5 Reconciling Access and Confidentiality in Federal Statistical and Health Data." National Research Council. 2009. Protecting Student Records and Facilitating Education Research: A Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/12514.

Page 52 Cite

Page 53 Cite

Page 54 Cite

Page 55 Cite

Page 56 Cite

Page 57 Cite

Page 58 Cite

Page 59 Cite

Page 60 Cite

Page 61 Cite

Page 62 Cite

Page 63 Cite

Page 64 Cite

Page 65 Cite

Page 66 Cite

Page 67 Cite

Page 68 Cite

Page 69 Cite

Page 70 Cite

Page 71 Cite

Page 72 Cite

Page 73 Cite

Page 74 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

5 Reconciling Access and Confidentiality in Federal Statistical and Health Data This chapter focuses on reconciling research access to administrative data with privacy, confidentiality, and consent requirements in health care and other sectors outside education, as well as considering the impli- cations for education research. The first section describes the Census Bureauâs approach to statistical use of administrative data and outlines options for allowing researchers to access data sets while protecting con- fidentiality. The second section includes an overview of data access and confidentiality issues and further discussion of options for reconciling these issues. The third section discusses the impact of the Health Infor- mation Portability and Accountability Act (HIPAA) on health research using medical records, and the fourth section outlines concepts for a data stewardship entity that could potentially facilitate health research. Finally, the chapter summarizes an extended discussion of the implications of experiences in these other sectors for education research. integrating administrative data into census bureau programs Gerald Gates explained that his years at the Census Bureau as a privacy officer and earlier as an administrative records program officer had made him aware that privacy is âthe key issueâ in obtaining access to administrative records and sharing them with researchers. On the basis of his understanding of laws, regulations, and current practices, he outlined three fundamental principles for statistical use of administrative 51

52 PROTECTING STUDENT RECORDS records. First, individuals must be informed of the uses of their personal information and given the ability to control such uses. Second, he argued, administrative data can be shared for statistical purposes without consent, provided the data are protected from nonstatistical uses, echoing a point Straf had made earlier (see Chapter 1). Third, federal agencies must pro- vide effective data stewardship, ensuring both appropriate protections and optimal use (Gates, 2008). Moving to more practical issues, Gates observed that it is âhard to get these data,â requiring negotiations among lawyers, program managers, policy advisers, and institutional review boards in order to reach agree- ments on data sharing. He noted that any violation of confidentiality protections negatively affects all parties, including an administrative or a statistical agency that shares data with a researcher. However, the par- ties are not equally liable for protecting confidentiality. Depending on the arrangement for sharing of data, the researcher may not be liable, but the federal agency is always liableâwhich may make an agency reluctant to provide access. In addition, news reports about breaches of security in federal data systems (e.g., Lee and Goldfarb, 2006) raise concerns among the public and in Congress and put pressure on agencies to protect, rather than share, their data. Legal and Policy Support Both law and policy support the use of administrative data for statisti- cal purposes, Gates said. The law not only authorizes the Census Bureau to acquire administrative records, but also goes further to state that the bureau must use such records âto the maximum extent possible . . . instead of conducting direct inquiriesâ (U.S. Code, Title 13, Sections 6, 9, and 23). The law protects administrative information that is used for sta- tistical purposes from being reused for administrative purposes. The Con- fidential Information Protection and Statistical Efficiency Act (CIPSEA) of 2002, another important law, requires uniform confidentiality protections among federal agencies that collect information for statistical purposes. Prior to this law, Gates said, agencies protected this information under a variety of laws and regulationsâsome more ironclad than others. Many policy studies also support the use of administrative records for statistical purposes. In a key report, the congressionally mandated Privacy Protection Study Commission (1977) defined the concept of âfunctional separationâ between use of individual information for statistical pur- poses and for administrative purposes. More recent reports (e.g., National Research Council, 1993) support the use of administrative records for statistical purposes in ways that protect individual privacy and data confidentiality.

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 53 Uses of Administrative Records in Statistical Programs Gates said that administrative recordsâsuch as the tax records the Census Bureau obtains from the Internal Revenue Serviceâcan be useful to statistical programs in several different ways: â¢ to assess population coverage in surveys; â¢ to assess the nature and impact of survey nonresponse; â¢ o aid survey methodologists in understanding the nature and t extent of sampling error; â¢ to improve survey data editing and imputation; â¢ to improve questionnaire design; â¢ to make improvements in survey sampling frames; â¢ to improve simulation models for policy evaluation and review; â¢ as a source for economic survey sample frames; â¢ s measures of migration for producing population estimates a between censuses; â¢ s a source of information about income, poverty, and health insur- a ance at the substate level; and â¢ o investigate social, economic, demographic, and occupational t differentials in mortality. Recent Census Bureau Data Linkage Activities Gates described several recent data linkage activities at the Census Bureau. Analysts began developing the Statistical Administrative Records System (StARS) before the 2000 census, collecting information from five agencies that replicates the answers to questions on the short form of the census. Originally developed as a low-cost alternative to improving within-household census coverage, the new records system has improved the bureauâs demographic information, which will enable improvements in the demographic data collected in the next census. The new system also provides more up-to-date information, such as very current change- of-address data from the U.S. Postal Service, making it a very valuable resource. The Census Bureau launched the Longitudinal Employer Household Dynamics Program in 1999 to integrate census, survey, and administrative records data on workers and employers, including state unemployment insurance wage records. This data system provides a detailed, comprehen- sive picture of workers, employers, and their interaction in the national economy. While offering unprecedented detail on the local dynamics of labor markets, the data program maintains confidentiality through advanced confidentiality protection methods (Abowd et al., 2005). Another recent Census Bureau program to link data sets is the Med-

54 PROTECTING STUDENT RECORDS icaid Undercount Project. This data system includes information from the Current Population Survey, the National Health Interview Survey, and the Medicaid program. Its goal is to examine âperplexingâ discrepan- cies between estimates of Medicaid enrollment from population surveys and enrollment counts from the Medicaid programâs own administrative data. Protecting Privacy and Confidentiality of Administrative Data Gates said that the key challenge in obtaining administrative data for statistical activities and research access is to protect privacy and con- fidentiality. He distinguished between privacy, which he defined as an individualâs right to control the use and disclosure of information about himself or herself (Fanning, 2007), and confidentiality, echoing a point made earlier by Miron Straf (see Chapter 1). Collecting and using social security numbers, which play a key role in integrating administrative data sets, raise privacy concerns. To address such concerns, the Census Bureau and other federal agencies convert the social security numbers to protected identifying keys that cannot be decoded except by a handful of persons who know the code. After data sets are linked, social security numbers and names are removed and replaced with these keys. Public awareness heightens the challenge of protecting privacy and confidentiality, Gates said. For example, in 1999, the privacy commis- sioner of Canada effectively shut down a major data linking project on the grounds that it had not been sufficiently publicized. In announcing this decision, the commissioner observed that, although the agency in charge of the project (Human Resource Development Canada) had not tried to hide its effort to collect and merge individual information, Canadian citizens remained unaware of how much information was being gath- ered about them and the extent to which it was being shared with others (Gates, 2008). Gates said it is important to acknowledge and address peopleâs concerns about merging and reusing various sources of admin- istrative data. Maintaining the confidentiality of individually identifiable records poses a greater challenge today than it did in the past for two reasons, Gates maintained. First, policy makers and researchers in education, health, and other fields are demanding detailed, individually linked data sets. Second, when such data sets are made public, people have access through the Internet to many more sources of data that could be used to identify individuals in the data set.

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 55 Options for Providing Access to Administrative Data Gates presented several options for providing safe, useful access to administrative data. He explained that it was important to consider the range of options, because no single option will meet the needs of every data producer and every data user. Options for Public Use Without Restrictions Traditionally, the Census Bureau and other federal statistical agencies have made survey data and some types of administrative data available in the form of âpublic-use microdata systems,â and these data sets continue to meet the needs of most researchers today, Gates said. To protect these data against an intruder who might try to identify one or more individu- als, the public-use microdata systems include only a sample of records. For example, for the decennial census, the largest available national public-use microdata file includes only 6 percent of the population. In addition, bureau analysts remove all direct and indirect identifiers, restrict the amount of geographic information shown, remove outliers, and take other steps to reduce the risk of disclosure. The advantages of these data systems include availability to the public with no limits on use and easy analysis using most software. At the same time, public-use microdata sets have several limitations, due to the restrictions on geographic infor- mation, the removal of outliersâwhich are often the most interesting dataâand other confidentiality protections. Gates explained that, for many administrative data sets, further pro- tections are needed to protect the confidentiality of individual informa- tion. One option is to develop synthetic data. Synthetic data sets have sev- eral advantages. They are designed specifically to protect administrative data, they can be made accessible to the public or to researchers without restrictions, and they are easy to analyze using most software. However, these data also have limitations. Because synthetic data sets are custom- ized to meet the needs of specific groups of users, they will not satisfy all researchers. In addition, research to date has not yet demonstrated the quality and usefulness of synthetic data for a wide range of different types of applications and analyses. Options for Restricted Use Restricting use of administrative data sets provides another layer of protection, Gates explained. The Census Bureau pioneered one option to restrict useâthe Research Data Center. After establishing the first center in Boston in the 1990s, the bureau expanded the network and today oper- ates nine such centers. At these centers, users may directly access adminis-

56 PROTECTING STUDENT RECORDS trative data sets that include everything but direct identifiers (name, social security number). The data include outliers, and the user may link the data to external data sets. However, these centers also have limitations. The researcher or other user must go first go through a complex approval process and then relocate to the regional research data center, limiting possibilities for collaboration with other researchers. Another option for restricting access to administrative system data is through a licensing system, such as the system operated by the National Center for Education Statistics (see Chapter 2). This option has the advan- tage of allowing the licensed researcher to directly access data sets at her or his own institution, using the researcherâs own software, and facilitat- ing collaboration with colleagues at the institution. The limitations of this option include the possibility of losing oneâs license if an onsite inspection by the licensing agency finds violations of the confidentiality protections or other elements of the licensing agreement. In addition, the license agreement typically does not allow the researcher to link the licensed data to external data sets. Remote Access Options Gates said that the option of providing researchers with remote access to statistical data is growing rapidly. In this option, the researcher submits programs to an intermediary, which applies the programs to restricted- use data and provides the results and tabulations to the researcher. For example, the National Opinion Research Center has created a âdata enclave,â with data sets from the National Institute of Standards and Technology and other sources and conducts analyses of these data at the request of approved researchers. The National Center for Health Statistics provides access to approved researchers by e-mail through the Analytic Data Research by E-mail (ANDRE) system and has also created a virtual research data center. Remote access options have the advantage of providing easy access to administrative data sets by e-mail or the Inter- net. However, the types of analysis possible are limited by the specialized software housed on the servers of the data enclave. In addition, outliers in the data sets have been removed to protect against disclosure of indi- vidually identifiable information, and some data enclaves require users to pay a subscription. A final option for providing access to administrative data is to obtain informed written consent from the individuals whose administrative records are sought. Consent has the advantage of putting the individual â See http://www.norc.org/DataEnclave/. â See http://www.cdc.gov/nchs/r&d/rdc.htm.

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 57 in control and potentially allows the most flexible access to, and use of, individual record data. The limitations include potential bias in the data set, because some individuals will not give consent for use of their records, and it may not be possible to locate other individuals. When using this option, Gates said, it is critically important to test different formats for informing individuals of the proposed uses for their record data and requesting their consent. Gates concluded with several key points from his paper (Gates, 2008). First, administrative records serve many important research uses; these uses are supported by law and facilitated by the advanced technology and methods for linking records available today. Second, although all parties who have access to administrative records have a great incentive to pro- tect confidentiality, they are not all equally liable for any possible breach of confidentiality. Third, access is affected by increased public scrutiny of privacy protections, which leads to development of new data stewardship principles and policies. Gates explained that agencies sometimes apply new policies and procedures in response to privacy violations âin order to survive.â Finally, he observed that the variety of new options for access reflects the degree of control the agency holding the records is willing to relin- quish to permit specified uses. Gates emphasized that agencies will pro- vide access if they are comfortable that their requirements are going to be met, leading to a âstaged approach,â rather than trying to provide access to all data for all users. Instead, the agency considers the needs of particular groups of data users, provides only the amounts and types of data needed, and imposes restrictions on use of these data. Responding to a question, Gates said that language is very impor- tant when talking about the complex issues involved in maintaining confidentiality of individual information. He said that the key point to convey, when talking to the public about using their personal records for statistical purposes, is that a federal agency will not use personal records to make administrative decisions, such as to determine oneâs level of social security benefits, and that the information will be merged with information from other individuals. He said that the Census Bureau had done some research to find that, when people were asked to give consent for use of their information âfor statistical purposes,â they did not understand the term. People were more comfortable when asked to give consent âfor statistics,â because they think of statistics as numbers, Gates said. In response to another question, Gates agreed with Robert Boruch that studies of the informed consent process are important, stat- ing that âthe statistical community needs to acknowledge that consent is an important issue.â Felice Levine highlighted a key question related to informed consent.

58 PROTECTING STUDENT RECORDS She said that, if researchers ask for informed consent when an individual takes a test or gives blood, they must consider what the consent is for. This question becomes important, she said, when a researcher asks an institutional review board for a waiver of consent. The board will look at the proposed research and consider whether the original consentâfor its intended purposesâwould be compromised by the research uses of the individual information. Levine suggested that it might âtrivializeâ the importance of informed consent if an individual were asked to simply check a box on a consent form, agreeing that the information could be used for all other legitimate research purposes. models FOR ENSURING DATA ACCESS AND PRIVACY PROTECTIONS Myron Gutmann began by observing that the general issues sur- rounding data confidentiality are well reported in Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data (National Research Council, 2007: Chapter 2). Data confidentiality concerns reflect principles for the protection of human subjects outlined in the Belmont report (National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979), and they are supported in regulations by the Federal Policy for the Protection of Human Subjects (the Common Rule). There is an ethical consensus, Gutmann said, on the need both to protect human subjects and to share data. In this general context, Gutmann said that protection of educa- tion data is special because Family Educational Rights and Privacy Act (FERPA) legislation and regulations are added on to the core protections of the Common Rule. The Common Rule was designed to protect human subjects in a research environment, assumes that prior informed consent is fundamental, and uses a âreasonableâ standard for protection. In con- trast, FERPA was designed to protect students and their families in an educational environment, does not assume prior informed consent, and has âa much more absolute standard for protection.â Despite these differ- ences in the law and regulations, he said, in reality, researchers do obtain informed consent for the use of school records, but they get this consent from school administrators, rather than from parents or students. The Problem: With a Focus on Spatial Data Gutmann illustrated the process of disclosure review (i.e., review of data sets to assess and prevent disclosure of individually identifiable information) using the analogy of trying to find Waldo in the childrenâs book, Whereâs Waldo? (Handford, 1997). Gutmann said that, given five

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 59 attributes about Waldo, he can easily find him in a small crowd. In real- ity, administrative and survey data sets are large; by analogy, locating Waldo based on five attributes is much more difficult when he is in a large crowd. However, spatial information or electronic monitors make it easier to find Waldo even in a large crowd, because all explicit geographic locations are identifiable (VanWey et al., 2005). For example, if one knows that Waldo always stands next to the ring toss at the carnival, or if Waldo always carries around a radio transponder that signals his location, he stands out in the crowd. Gutmann explained that, when reviewing possible dissemination of data on a particular topic, it is important to recognize that these data may not be the only published source of information about that topic. For example, he might publish a map of Washtenaw County, Michigan, and list three attributes of an individual living in that county. This may pose little risk of revealing that individualâs identity, because there are many individuals with those three attributes across the entire county. However, because there may be only one individual with these attributes who lives in one particular city block, publishing this geographic information would be likely to reveal the individualâs identity. In the case of educa- tion records, if a few attributes of an individual student were published, along with that studentâs school, then the individual student would be easily identifiable. Protecting Confidentiality: Goals and Options Gutmann outlined two goals for protecting confidentiality when shar- ing or disseminating microdata: (1) to eliminate direct identifiers and (2) to eliminate unique individuals in small cells. Options for achieving these goals in tabular data with area identifiers but no precise spatial locations include the following: â¢ ggregating values (e.g., into five-year age groups instead of single a years); â¢ op-coding (recoding values so that extreme values are combined t with less extreme values); â¢ swapping data across spatial units; and â¢ aying careful attention to easily identified categories of data, p especially geography, schools, and clustered samples. Gutmann said that Putting People on the Map (National Research Council, 2007) differentiates between technical and institutional options for protecting confidentiality. Technical options include replacing real data with synthetic data for potentially identifying attributes and creating

60 PROTECTING STUDENT RECORDS secure data analysis systems. Institutional options, which focus on indi- viduals and organizations rather than on technology, include contracts and data enclaves. Institutional options vary, based on the perceived level of risk of disclosure; more complex options are more expensive and make research more difficult. For example, Gutmann said, the University of Michigan houses one of the nine research data centers operated by the Census Bureau. The center is expensive and difficult to use for investiga- tors who live more than 30 miles away. Gutmann presented a figure illustrating the gradient of levels of risk and levels of protection required for different types of data sets (see Figure 5-1). Simple data sets, with little risk of disclosure and little risk of harm if there were a disclosure, can be made publicly accessible on the Internet. If the data set is more complex and the risk of harm from disclosure is greater, then the data producer might require a formal data use agreement before providing access. Some data sets include information about illegal or undesirable behavior, which people voluntarily provide. Because dis- closure of individual identities in such data sets would pose a great risk of harm, the data producer might require a strong data use agreement. This agreement would be likely to include restrictions on technology and tech- FIGURE 5-1â The gradient of risk and restriction. Source: Gutmann (2008).

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 61 nology access, similar to those included in National Center for Education Statistics licensing agreements (see Chapter 2). Finally, if there is a very high risk of disclosure and a high risk of harm, the data producer might âjust lock things into an enclosed data centerâ and require the researcher to come and use the data onsite, Gutmann said. Inter-University Consortium for Political and Social Research Gutmann presented an overview of the activities of the Inter-University Consortium for Political and Social Research, the data archive he directs (see http://www.icpsr.umich.edu/). Experts at the archive conduct dis- closure risk analysis of all data sets that enter the archive. In most cases, they remove all direct and indirect identifiers in the data sets and make them available through the Internet to users who have signed contracts with the consortium. Because most of these data sets are based on politi- cal opinion polls and information on voter behavior, Gutmann described their information as âreally not very harmful.â Other data, which are not useful for research without some indirect identifiers, are subject to further restrictions, ranging from easy to more stringent contracts. However, the consortium lacks authority to impose large fines on individuals who violate these contracts, in contrast to the National Center for Education Statistics (see Chapter 2). Finally, the most sensitive data sets, posing the greatest risk of harm to an individual whose identity might be disclosed, are housed in an onsite data enclave for use by researchers who appear in person. The consortium houses and maintains data sets and conducts research with support from many organizations, including the National Institute of Child Health and Human Development and other agencies in the U.S. Department of Health and Human Services, the National Institute of Justice, and the consortiumâs member institutions. For example, the consortium is currently taking over dissemination of the National Lon- gitudinal Study of Adolescent Health (Add Health) from the University of North Carolina at Chapel Hill. This study includes several different types of data, including survey (self-report) data and biomarker data. Biomarker dataâindicators of disease or health, such as blood pressure, heart rate, and the presence or absence of certain moleculesâinclude both physical specimens and digital representations. The study also includes analysis of ancillary data, including participantsâ high school transcripts. In compliance with FERPA, the investigators received individual written consent from each student in order to obtain access to the transcripts from their schools. To access these national data sets, a researcher must receive approval from the institutional review board at her or his home institution and provide a data security plan.

62 PROTECTING STUDENT RECORDS Future Directions in Reconciling Research Access with Confidentiality In the future, Gutmann said, all administrative data sets should be online, with âreasonableâ restrictions on access. These future data systems will be very distributed and have the capability to combine data âon the flyâ to preserve confidentiality. They will have the capability to automati- cally recognize and solve confidentiality issues before the data reach the userâs computer. In addition, these advanced future systems will build user communities on the basis of dynamic patterns of data usage. Turning to future developments, Gutmann first contrasted FERPA with the Common Rule. He said that the Common Rule directs that requirements for reuse of data match the commitment contained in the original informed consent, although this does not always happen in real- ity. In contrast, because education records are administrative in origin, they are collected without informed consent, and FERPA rules absolutely forbid reuse of the data in ways that have any potential for identification of individuals, Gutmann said. In some cases, it is possible for researchers to use education records in compliance with FERPA by obtaining retro- spective consent. For example, at the time they wanted to access high school transcripts for inclusion in the adolescent health study, researchers were already planning to go into the field to interview study participants. Near the end of the interview, they requested access to the studentsâ tran- scripts, and over 80 percent gave consent. More often, however, research- ers do not want to ask participants in a long-term study for consent, because they may decline and drop out of the study altogether. Gutmann said that research organizations like his have found approaches that work well for archiving and using education data. He observed that researchers often think of school administrators as partners in the research process. Finally, Gutmann emphasized that current data protection schemes appear to work very well, with no known examples of individuals having been harmed by confidentiality breaches (see Chapter 1). This is important to keep in mind, he said, so that confidentiality does not become a higher priority than conducting good research. In response to a question, Gutmann said he did not know of any stud- ies about obtaining retrospective consent. Levine said that the process for obtaining a waiver of the informed consent requirement of the Common Rule from an institutional review board was well understood and used to facilitate research. Gutmann agreed, but cautioned that, in some cases, an institutional review board might grant a waiver under the Common Rule, but not under FERPA. Levine responded that FERPA was silent on this point. Because the law does not elaborate on the possibility of a waiver of informed consent, she said, lawyers and researchers can only assume that a waiver is not allowed. Gutmann responded that the Common Rule regulations have been revised several times over the years, on the basis of

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 63 the experiences of the large community of data producers and research- ers that the regulations govern. In contrast, he said that the Department of Education recently announced the first proposed revisions to FERPA regulations (see Chapter 2). Barbara Schneider said that universities are careful to comply with FERPA and that she knows of many planned studies in which the research- ers have decided to obtain written consent for access to education records rather than seek a waiver. Robert Boruch observed that research partner- ships represent the future of social science research and that investigators would not be able to obtain more data without cooperating with educa- tion agencies and other data producers. He urged the research commu- nity to share experiences and approaches to obtaining access to data, including memoranda of understanding with education agencies and legal arguments. IMPACT OF THE HIPAA PRIVACY RULE ON RESEARCH Roberta Ness (University of Pittsburgh) shared a survey she con- ducted to assess the impacts on health research of the HIPAA Privacy Rule. The survey was commissioned by an Institute of Medicine (2008) committee as part of a larger study of the impacts of this law on health research. Privacy Rule Protections Ness explained that Congress enacted HIPAA in 1996 partly because the Common Rule did not definitively protect the privacy of individu- ally identifiable health information. The law was designed, she said, to protect the privacy of medical records. The Privacy Rule implementing HIPAA (U.S. Department of Health and Human Services, 2000, 2002) permits health care provider organizations to disclose individually iden- tifiable health information for research purposes only if the researcher has obtained written consent from each patient or, if that is impractical, a waiver of this requirement from an institutional review board. Although the rule does permit health care providers to disclose limited data sets with all identifiers removed to researchers who sign a formal data use agreement, Ness said that these data sets cannot be linked to any other medical records and are not useful for research. The HIPAA Privacy Rule also permits disclosures to public health authorities without written con- sent for the purpose of public health surveillance. Institutional review â The rule specifies 18 identifiers that must be removed, including geographic information and dates related to the individual.

64 PROTECTING STUDENT RECORDS boards at universities, hospitals, and other organizations implement the HIPAA Privacy Rule. Ness said that an institutional review board will generally provide a waiver of the informed consent requirement if it determines that the pro- posed research presents a low risk of disclosure of individually identifi- able information and that the proposed research could not be conducted without the waiver. She observed that there is âa great deal of local inter- pretationâ of these two conditions. Survey Content Ness presented several news reports describing epidemiological and clinical research studies that were halted or slowed after the enactment of HIPAA. Describing the reports as âworrisome,â she noted that they provide only anecdotal evidence about the possible effects of the HIPAA Privacy Rule on health research. The Institute of Medicine (IOM) com- missioned a national survey in 2007 in order to make a more informed assessment of the effects of the rule, she said (Ness, 2007a). The IOM survey was conducted in collaboration with 13 epidemiol- ogy professional societies. Each society contacted all of its active mem- bers, requesting that they respond to an anonymous, web-based survey about the HIPAA Privacy Rule. A total of 2,805 individuals accessed the website. Among this group, 1,527 indicated that they had submitted a research proposal to an institutional review board since the enactment of HIPAA, and the answers of this smaller group were analyzed. Ness explained that the survey was designed to ask about positive and negative influences of the HIPAA Privacy Rule. First, the survey presented questions with quantitative responses, such as such as how frequently respondents collected various types of data, changes in the numbers of participants recruited before and after implementation of the Privacy Rule, and the level of difficulty encountered when seeking waivers or approval for release of deidentified data sets from the insti- tutional review board. A second group of questions focused on respon- dentsâ perceptions of the ease and difficulty of conducting research under the HIPAA Privacy Rule and its impact on privacy and confidentiality, using a 5-point Likert scale. Third, the web-based survey presented five hypothetical research proposals, asking respondents whether their insti- tutional review board would approve them. Finally, respondents were asked open-ended qualitative questions, including a request for stories about HIPAA.

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 65 Analysis of Responses The respondents were predominantly women (59 percent), mostly employed in academia (66 percent), and they varied widely in age. In describing their perceptions, a large majority (84 percent) rated the degree to which the HIPAA Privacy Rule made research easier as low, at 1 to 2 on a 5-point Likert scale anchored at 1 = none. Responding to another question, a somewhat smaller majority (67.8 percent) rated the degree to which the Privacy Rule made research harder as high, at 4 to 5 on a 5-point Likert scale anchored at 5 = a great deal. Few respondents (10.5 percent) perceived the rule as having strengthened public trust a great deal, and only about one-quarter believed that the rule had greatly increased participant confidentiality. With respect to cost and delay, about half of the respondents perceived the rule as adding costs and delaying time to completion by a great deal. Ness was surprised that more respondents indicated that the rule had a negative effect on the protection of human subjects than the number of respondents indicating that the rule had a positive effect. Ness said this response was âalmost bizarre on the face of it, because, of course, this is legislation that was purposely enacted to improve the protection of human subjects.â However, she went on to say that the respondents explained their responses in the qualitative section of the survey, indicat- ing that they viewed the burden of paperwork resulting from adding HIPAA to the Common Rule as so great that medical patients no longer understood what they were giving informed consent for. She said that about 15 percent of respondents indicated that, although their research proposals were approved by the institutional review board, the health care organization holding the medical records would not allow access because of Privacy Rule concerns. An additional 11.5 percent of epidemiologists surveyed had conceived of a study but not submitted it to an institutional review board because they thought they would be unable to obtain approval due to the Privacy Rule. More than half of the respondents said that an application they had submitted to an institu- tional review board was strongly adversely impacted by HIPAA. Presenting the responses to the case study section of the survey, in which respondents were asked whether their institutional review board would approve five different types of studies, Ness said that the key find- ing was the wide variability (see Table 5-1). Such wide variation indicates that institutional review boards are interpreting the Privacy Rule in very different ways, Ness said. In response to the final section of the survey, inviting HIPAA sto- ries, Ness said they received almost 500 written responses, reflecting the âangst thatâs out there.â A total of 90 percent of the stories were negative,

TABLE 5-1 Responses to Case Studies, Institute of Medicine Survey 66 Would Your IRB Approve This Study?, Number (%) Yes, Yes, Yes, with Yes, Other No Unconditional with Waiver Approval Conditions Donât Know Participants from 184 (12.56) 123 (8.5) 262 (18.0) 522 (35.9) 135 (9.3) 229 (15.7) medical records contacted for interview/blood draw Participants from cancer 196 (13.5) 157 (10.8) 261 (18.0) 468 (32.3)a 175 (12.1) 193 (13.3) registry contacted to consent for interview Tissue bank to supply 222 (15.6) 199 (14.0) 159 (11.2) 262 (18.4)b 291 (20.4)c 290 (20.4) deidentified data for assay not in original consent Medical record review â 61 (4.7) 435 (33.8) 280 (21.8) 109 (8.5)d 121 (9.4) 281 (21.8) from subjects now dead Limited data set from 239 (20.2) â 58 (4.9) 260 (21.2) 427 (36.0)e 123 (10.4) 317 (26.7) another hospital; research cannot be done without some identifiers NOTE: IRB = institutional review board. aWith physician approval. bWith authorization and reconsent from patients. cLimited data set or other special circumstances. dWith approval from executor of estate. eWith limited data set agreement. SOURCE: Ness (2007). Copyright 2007 American Medical Association. Reprinted with permission.

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 67 5 percent were neutral, and 5 percent were positive. For example, one respondent wrote (Ness, 2007b): An already cumbersome patient consent form now has an additional page-and-a-half explaining HIPAA restrictions. This detracts from the informed consent process pertaining to the more critical issue: the actual medical risks and benefits of participating. In general, the written responses indicate that the HIPAA Privacy Rule had not stopped health research, but it had slowed research progress and increased costs. Many respondents expressed the view that the Privacy Rule was hurting public health surveillance and causing confusion in the public health community. Ness reminded the group that the HIPAA Pri- vacy Rule specifically permits disclosure of individual health information for public health surveillance. Summarizing the IOM survey results, Ness reiterated that only about one-quarter of the respondents believed that the Privacy Rule had enhanced privacy. Among all respondents, the rule was seen as having a more negative than positive impact on the protection of human subjects. The analysis of the responses suggests that institutional review boards around the nation interpret the rule in quite different ways, making it unclear whether many of the problems described in the survey are a func- tion of the Privacy Rule itself or local institutional review board interpre- tation of it. She said that the limitations of the survey include respondent bias; it may be that the respondents were those who feel most negatively about HIPAA. Another limitation is that it was not possible to calculate a response rate, because of the anonymous Internet process. In discussion, Ness explained that the primary audience for the sur- vey was the IOM committee, and that the survey reached a wider audi- ence through publication in the Journal of the American Medical Association (Ness, 2007a). She said that representatives of two agencies in the Depart- ment of Health and Human Servicesâthe Office for Civil Rights, which leads implementation of the Privacy Rule, and the National Institutes of Health, which funds a large amount of health researchâhad attended the IOM committeeâs meetings and were concerned about the wide variation in local interpretation of the rule. She observed that the IOM committee would decide how to respond to the problem in its final report. Stephen Plank (Johns Hopkins University) asked if the word âsur- veillanceâ raised public fears. Ness responded that the IOM commit- tee had commissioned a study of public attitudes toward privacy and health research (Westin, 2007), which found that language had a power- ful influence on individualsâ willingness to allow access to their health information.

68 PROTECTING STUDENT RECORDS A NEW APPROACH TO HEALTH DATA STEWARDSHIP P. Jon White (Agency for Healthcare Research and Quality) opened his remarks with the observation that health care in the United States has a quality problem. The Institute of Medicine (2000) found that between 50,000 and 100,000 deaths were caused each year by medical errors, and more recently, McGlynn (2003) found that health care recipients got the recommended level and type of care only about 55 percent of the time. At the same time, health care has a cost problem. The current annual expen- diture of $2.2 trillion represents a significant fraction of the nationâs gross domestic product, and health care costs are rising steeply, at an annual average rate of 6.5 percent. Current Efforts to Measure Health Care Quality White said that one proposed solution to both of these problems is to pay for quality, rather than paying for individual visits to the doctor or for individual procedures or treatments. The Agency for Healthcare Research and Quality is one of several organizations working toward this solution; these organizations all face the key question of how to set the health care quality goals that would guide payments. In one effort to answer this question, his agency awarded grants to support health care information technology systems for âenabling quality measurements.â In addition, the Department of Health and Human Servicesâ Center for Medicare and Medicaid Services funded six regional pilot projects to provide better quality information for Medicare beneficiaries. White provided an example to illustrate a key issue in using administrativeâor in this case, electronic claimsâdata to assess health care quality. If one wanted to assess the quality of Dr. Whiteâs treatment of diabetic patients, any single payer could provide claims data for only 10 to 15 percent of Dr. Whiteâs diabetic patients. Doctors have successfully argued that measures based on their treatment of these small samples of patients are inaccurate. To address this problem, the Agency for Health- care Research and Quality and other organizations are beginning efforts to assemble health claims data from multiple payers, including the infor- mation assembled by the technology systems and regional pilot projects described above. The agency has also funded development of 14 âchartered valued exchangesâ around the country. These are coalitions of health care provid- ers, payers, patients, and regulators who receive data from Medicare and from local payers and health care providers and try to use these data to â See http://grants.nih.gov/grants/guide/rfa-files/RFA-HS-07-002.html.

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 69 measure quality. White explained that the Centers for Disease Control and Preventionâs Biosense Health Surveillance Program taps into the existing streams of data in hospital information systems around the country and âsends it up to the mother ship in Atlanta.â Analysts there can monitor spikes in certain diseases, medical conditions, or symptoms. In addition to these federal efforts, health insurance companies and health main- tenance organizations are working to develop measures of health care quality. The New York attorney generalâs office has signed agreements with several major health care organizations to rank doctors based on quality of care, rather than on how much they cost the organization (New York State Attorney Generalâs Office, 2007). Other organizations that are trying to assemble and analyze health care data to develop measures of quality include the National Quality Forum, a publicâprivate partnership, Google, and Microsoft. All of these efforts face the question of who owns the health care data, White said. In the past, medical records were maintained in paper files, making it easier for any single doctor or hospital to own and keep them. With the change to digital records, it is possible for many individuals and organizations to own copies of health care records. White explained that he and his colleagues use the word âstewardship,â which he defined as âtaking care of something that doesnât belong to you.â He has been engaged in discussions of health data stewardship with many organiza- tions over the past few years, including the Ambulatory Quality Alliance, the National Committee on Vital and Health Statistics, which advises the Secretary of Health and Human Services, and the American Medical Informatics Association. These initiatives also face privacy and security issues, as individual health records are protected by HIPAA, the Common Rule, and state and local laws and policies. In 2005, the Agency for Healthcare Research and Quality helped to fund a collaborative effort among more than 30 states and territories to study their privacy laws and regulations governing medical records. As a result of these studies, the participants have begun working to harmonize these laws and regulations, both within and across states. A Data Stewardship Entity Returning to the concept of stewardship, White explained that the idea of assembling multiple sources of data in order to improve health care quality emerged several years ago in the Ambulatory Quality Alli- ance. The alliance includes representatives of Whiteâs agency, two physi- â See http://www.cdc.gov/BioSense/.

70 PROTECTING STUDENT RECORDS ciansâ organizations, and an association of health insurance companies. The alliance members recognize that doctors, laboratories, health insur- ance plans, and patients all have separate pieces of the health care infor- mation needed to measure quality. Through discussion, they developed principles for sharing and aggregation of these disparate sources of data, including (Ambulatory Quality Alliance, 2006): â¢ transparency with respect to framework, process, and rules; â¢ easurement of provider performance derived from standardized m metrics and data collection protocols that can be compared with national, regional, or other suitable benchmarks and otherwise assists in the analysis of assessments of health care quality and cost of care; â¢ seful data for physicians to improve the quality and cost of care u they provide to their patients and other appropriate purposes (e.g., maintenance of certification); â¢ ublic reporting to consumers of user-friendly, meaningful, and p actionable information about physician quality and cost of care; and â¢ he collection of both public and private data so that physician t performance can be assessed as comprehensively as possible. White explained that, as the Ambulatory Quality Alliance members discussed these principles, they reached agreement on the need for a new health care data stewardship entity. When developing the mission and scope of the entity (Ambulatory Quality Alliance, 2006), they were unclear about whether the entity would simply set guidelines for assem- bling and managing data or would actually serve as a data archive. To solicit answers to this and other questions about the entity, the Agency for Healthcare Research and Quality published a request for information. Over 100 public and private health care organizations and individuals responded to the request, and the agency published a qualitative sum- mary of their comments (Agency for Healthcare Research and Quality, 2007). The varied responses included significant support for both possible roles of the entity: setting guidelines for data stewardship and acting as the data steward. At the same time, some respondents expressed signifi- cant concerns, and some were completely opposed to the idea of sharing their personal medical records. White said he found it very valuable to hear and understand these views from the public. Near the end of his presentation, White posed several questions that his agency and others are discussing as they consider the possibility of creating a health care stewardship entity:

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 71 â¢ hat is your problem? What are you trying to address? What do W you need to do this for? â¢ o you need a referee to address your problem by helping to set D and enforce the rules of the game? â¢ o you need someone to hold the information you need to address D your problem? â¢ ow do you avoid unintended consequences, including breaches H of privacy and confidentiality? White closed by warning that websites exist today, at which, for a small fee, an individual can enter a medical condition and receive a list of people who have that condition. The website managers gather the information from sources that are not governed by HIPAA. Finally, he said there are many questions and no conclusions. In discussion, White observed that lobbyists on Capitol Hill are telling Congress now that the patient ought to control her or his health records, although the doctor and the insurance company should also be allowed to access the records. White said that such proposals miss the possibility of using medical records for research that could improve the health care system for the public good. discussion: implications for research using education records Reflecting on Lizanne DeStefanoâs earlier presentation about devel- opment of partnerships between researchers and schools (see Chapter 4), Miron Straf said that statistical agencies should work with administrative agencies, helping them to develop their data systems for statistical use. He asked whether there was a federal role in providing this type of assistance to education agencies. Supporting Research Partnerships Through Trust and Technical Assistance Marilyn Seastrom replied that 27 states currently have grants from the Department of Educationâs Institute of Education Sciences to develop longitudinal databases of education records and that more funding will be provided to the states in fiscal year 2009 (see Chapter 2). One require- ment of these grants, she said, is that the states make the databases user- friendly and accessible to researchers. She said that, even if FERPA were changed to make data-sharing easier, states and school districts might still refuse researchersâ requests for data access if they lack the resources and technical capacity to do so.

72 PROTECTING STUDENT RECORDS DeStefano responded that, in addition to trust, the motivation of research partners is also important. When a school or state education agency does not have a strong motivation to participate in a research project, she argued, the agencyâs leaders are more likely to say that they cannot share data because of FERPA. Seastrom agreed that trust and the development of relationships would continue to be very important for researchers to gain access to education data. Paula Skedsvold (American Educational Research Association) asked whether legislative changes were needed in FERPA to clarify the mean- ing of research âfor, or on behalf ofâ an education agency. She observed that, in response to the American Educational Research Associationâs survey about FERPA, some respondents indicated that they had simply abandoned research projects, because they could not obtain access to the education records they needed. Seastrom asked workshop participants to describe how much state or local education agencies and researchers themselves alter education record data to protect confidentiality. Schneider replied that, in one case, her team had helped to deiden- tify a file of teacher information. She observed that researchers outside her team who wish to use the file are required to apply to their own institutionâs institutional review board and to the state of Michiganâs institutional review board for the use of data, providing a data protection plan along with other information about the proposed research. DeStefano and others agreed that researchers should provide techni- cal assistance to state and local education agencies to increase their capac- ity in techniques of deidentification. Weighing Risks and Benefits of Disclosure and Research Martin Orland proposed that Gutmannâs matrix of risk and restric- tion (see Figure 5-1) should include another dimensionâthe likelihood of harm. Gutmann responded that the matrix included risk of harm and disclosure, and Orland replied that risk of harm and risk of disclosure should be two different dimensions. For example, he said, the risk of a nuclear power plant accident is minuscule, but this unlikely event could cause âenormousâ harm. Orland expressed concern that the harm caused by even one disclosure of individually identifiable information could adversely affect the entire research environment, especially in light of the public and congressional concerns about privacy that had generated the HIPAA legislation. Gutmann agreed with Orland that it is important to differentiate between the risk of disclosure and the risk of harm. For example, he said, there is almost nothing on the short-form census questionnaire that

RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 73 should cause an individual to be concerned if it were publicly revealed. In contrast, he said, other data that individuals provide in surveys or that are included about them in administrative records would pose great risk of harm if they were revealed, as shown in his graphic (see Figure 5-1). He went on to explain that he is much more worried about the potential harm disclosure could cause to groups of individuals than he is about the harm to researchers of limited access to data. He said that the people who fund his data consortium do so because they do not want to see a front page story in the news about any revelation of individual identities based on data they collect. Gerald Gates added that, unlike Gutmann, privacy laws do not distin- guish between more sensitive and less sensitive individual information; these laws simply state that individual information cannot be disclosed. Although institutional review boards in federal agencies consider the sensitivity of different data sets when determining how to protect them, they focus primarily on complying with the letter of the law by protecting against any disclosure of any individually identifiable information. Levine responded that one criticism of institutional review boards is that, when reviewing a research proposal, they fail to distinguish between the risk of disclosure and the magnitude of harm that a disclosure would cause. Gutmann responded that institutional review boards do not always take advantage of the flexibility they have to allocate their time and resources. For example, the University of Michigan institutional review board has explicit rules stating that research proposals to use data from a list of specific deidentified public data sources (including the Census Bureau, his institute, and other sources) do not require institutional review board approval (University of Michigan, 2008). Therefore, the board does not need to devote resources to reviewing these research proposals and can focus on other proposals in which protecting human subjects is more important. Gutmann suggested that the research community continue to work with institutional review boards to make sure that they are devoting their resources where they are most needed, especially because he sees most institutional review boards as âoverwhelmed.â Seastrom responded that, while she agreed with Gutmann, the exact opposite would be the case for a disclosure review board. This type of board would be very concerned about what type of public information a researcher would add to a data set, she said, and Gutmann agreed. Straf said that it was important not only to distinguish between the risk of a potential disclosure and the harm that could be caused, but also â The University of Michigan (2008) policy states that a researcher who plans to merge more than one public data set and recognizes that this may increase the risk of identification of individual research participants should consult the institutional review board.

74 PROTECTING STUDENT RECORDS between the risk of disclosure and the benefits of research. Gutmann responded that earlier workshop sessions had illustrated the benefits of using education records for research (see Chapter 3). DeStefano said that, in the partnership model, education agencies and researchers discuss the specific benefits of particular research projects, rather than considering the general benefits of research to society; she observed that the University of Illinois institutional review board had made note of these specific benefits when reviewing research partnership proposals. Ness said that the Institute of Medicine committee extensively dis- cussed the risks and benefits of research using individual health infor- mation. The committee commissioned surveys showing that the public is âhungryâ for health information, and thinks that the United States should remain the world leader in generating new medical knowledge (Westin, 2007). She suggested placing the new knowledge resulting from research in âa very central positionâ when weighing research benefits and privacy risks. Levine agreed that the public increasingly recognizes the importance of health as a public good, saying that the public should view education in the same way. Boruch said that an early report on privacy and confidentiality by the Committee on National Statistics (National Research Council, 1979) included an analysis of how people react to a request for personal infor- mation presented in different ways. He suggested that the survey of pub- lic attitudes commissioned by the Institute of Medicine Committee might be a valuable resource for understanding how to frame such requests, which is a challenging task across fields of social science research; Ness said the survey is publicly available (Westin, 2007). Schneider urged the American Educational Research Association to continue providing professional development on keeping data confiden- tial. While acknowledging her fear that a breach of individual identity was inevitable, she said it was critical to educate the research community about confidentiality and how best to safeguard it.

Next: 6 Reflections and Next Steps »

Protecting Student Records and Facilitating Education Research: A Workshop Summary (2009)

Chapter: 5 Reconciling Access and Confidentiality in Federal Statistical and Health Data

Welcome to OpenBook!

Get Email Updates