5
Reconciling Access and Confidentiality in Federal Statistical and Health Data

This chapter focuses on reconciling research access to administrative data with privacy, confidentiality, and consent requirements in health care and other sectors outside education, as well as considering the implications for education research. The first section describes the Census Bureau’s approach to statistical use of administrative data and outlines options for allowing researchers to access data sets while protecting confidentiality. The second section includes an overview of data access and confidentiality issues and further discussion of options for reconciling these issues. The third section discusses the impact of the Health Information Portability and Accountability Act (HIPAA) on health research using medical records, and the fourth section outlines concepts for a data stewardship entity that could potentially facilitate health research. Finally, the chapter summarizes an extended discussion of the implications of experiences in these other sectors for education research.

INTEGRATING ADMINISTRATIVE DATA INTO CENSUS BUREAU PROGRAMS

Gerald Gates explained that his years at the Census Bureau as a privacy officer and earlier as an administrative records program officer had made him aware that privacy is “the key issue” in obtaining access to administrative records and sharing them with researchers. On the basis of his understanding of laws, regulations, and current practices, he outlined three fundamental principles for statistical use of administrative



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 51
5 Reconciling Access and Confidentiality in Federal Statistical and Health Data This chapter focuses on reconciling research access to administrative data with privacy, confidentiality, and consent requirements in health care and other sectors outside education, as well as considering the impli- cations for education research. The first section describes the Census Bureau’s approach to statistical use of administrative data and outlines options for allowing researchers to access data sets while protecting con- fidentiality. The second section includes an overview of data access and confidentiality issues and further discussion of options for reconciling these issues. The third section discusses the impact of the Health Infor- mation Portability and Accountability Act (HIPAA) on health research using medical records, and the fourth section outlines concepts for a data stewardship entity that could potentially facilitate health research. Finally, the chapter summarizes an extended discussion of the implications of experiences in these other sectors for education research. INTEgRATINg ADMINISTRATIVE DATA INTO CENSuS BuREAu PROgRAMS Gerald Gates explained that his years at the Census Bureau as a privacy officer and earlier as an administrative records program officer had made him aware that privacy is “the key issue” in obtaining access to administrative records and sharing them with researchers. On the basis of his understanding of laws, regulations, and current practices, he outlined three fundamental principles for statistical use of administrative 

OCR for page 51
 PROTECTING STUDENT RECORDS records. First, individuals must be informed of the uses of their personal information and given the ability to control such uses. Second, he argued, administrative data can be shared for statistical purposes without consent, provided the data are protected from nonstatistical uses, echoing a point Straf had made earlier (see Chapter 1). Third, federal agencies must pro- vide effective data stewardship, ensuring both appropriate protections and optimal use (Gates, 2008). Moving to more practical issues, Gates observed that it is “hard to get these data,” requiring negotiations among lawyers, program managers, policy advisers, and institutional review boards in order to reach agree- ments on data sharing. He noted that any violation of confidentiality protections negatively affects all parties, including an administrative or a statistical agency that shares data with a researcher. However, the par- ties are not equally liable for protecting confidentiality. Depending on the arrangement for sharing of data, the researcher may not be liable, but the federal agency is always liable—which may make an agency reluctant to provide access. In addition, news reports about breaches of security in federal data systems (e.g., Lee and Goldfarb, 2006) raise concerns among the public and in Congress and put pressure on agencies to protect, rather than share, their data. Legal and Policy Support Both law and policy support the use of administrative data for statisti- cal purposes, Gates said. The law not only authorizes the Census Bureau to acquire administrative records, but also goes further to state that the bureau must use such records “to the maximum extent possible . . . instead of conducting direct inquiries” (U.S. Code, Title 13, Sections 6, 9, and 23). The law protects administrative information that is used for sta- tistical purposes from being reused for administrative purposes. The Con- fidential Information Protection and Statistical Efficiency Act (CIPSEA) of 2002, another important law, requires uniform confidentiality protections among federal agencies that collect information for statistical purposes. Prior to this law, Gates said, agencies protected this information under a variety of laws and regulations—some more ironclad than others. Many policy studies also support the use of administrative records for statistical purposes. In a key report, the congressionally mandated Privacy Protection Study Commission (1977) defined the concept of “functional separation” between use of individual information for statistical pur- poses and for administrative purposes. More recent reports (e.g., National Research Council, 1993) support the use of administrative records for statistical purposes in ways that protect individual privacy and data confidentiality.

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA uses of Administrative Records in Statistical Programs Gates said that administrative records—such as the tax records the Census Bureau obtains from the Internal Revenue Service—can be useful to statistical programs in several different ways: • to assess population coverage in surveys; • to assess the nature and impact of survey nonresponse; • to aid survey methodologists in understanding the nature and extent of sampling error; • to improve survey data editing and imputation; • to improve questionnaire design; • to make improvements in survey sampling frames; • to improve simulation models for policy evaluation and review; • as a source for economic survey sample frames; • s measures of migration for producing population estimates a between censuses; • s a source of information about income, poverty, and health insur- a ance at the substate level; and • o investigate social, economic, demographic, and occupational t differentials in mortality. Recent Census Bureau Data Linkage Activities Gates described several recent data linkage activities at the Census Bureau. Analysts began developing the Statistical Administrative Records System (StARS) before the 2000 census, collecting information from five agencies that replicates the answers to questions on the short form of the census. Originally developed as a low-cost alternative to improving within-household census coverage, the new records system has improved the bureau’s demographic information, which will enable improvements in the demographic data collected in the next census. The new system also provides more up-to-date information, such as very current change- of-address data from the U.S. Postal Service, making it a very valuable resource. The Census Bureau launched the Longitudinal Employer Household Dynamics Program in 1999 to integrate census, survey, and administrative records data on workers and employers, including state unemployment insurance wage records. This data system provides a detailed, comprehen- sive picture of workers, employers, and their interaction in the national economy. While offering unprecedented detail on the local dynamics of labor markets, the data program maintains confidentiality through advanced confidentiality protection methods (Abowd et al., 2005). Another recent Census Bureau program to link data sets is the Med-

OCR for page 51
 PROTECTING STUDENT RECORDS icaid Undercount Project. This data system includes information from the Current Population Survey, the National Health Interview Survey, and the Medicaid program. Its goal is to examine “perplexing” discrepan- cies between estimates of Medicaid enrollment from population surveys and enrollment counts from the Medicaid program’s own administrative data. Protecting Privacy and Confidentiality of Administrative Data Gates said that the key challenge in obtaining administrative data for statistical activities and research access is to protect privacy and con- fidentiality. He distinguished between privacy, which he defined as an individual’s right to control the use and disclosure of information about himself or herself (Fanning, 2007), and confidentiality, echoing a point made earlier by Miron Straf (see Chapter 1). Collecting and using social security numbers, which play a key role in integrating administrative data sets, raise privacy concerns. To address such concerns, the Census Bureau and other federal agencies convert the social security numbers to protected identifying keys that cannot be decoded except by a handful of persons who know the code. After data sets are linked, social security numbers and names are removed and replaced with these keys. Public awareness heightens the challenge of protecting privacy and confidentiality, Gates said. For example, in 1999, the privacy commis- sioner of Canada effectively shut down a major data linking project on the grounds that it had not been sufficiently publicized. In announcing this decision, the commissioner observed that, although the agency in charge of the project (Human Resource Development Canada) had not tried to hide its effort to collect and merge individual information, Canadian citizens remained unaware of how much information was being gath- ered about them and the extent to which it was being shared with others (Gates, 2008). Gates said it is important to acknowledge and address people’s concerns about merging and reusing various sources of admin- istrative data. Maintaining the confidentiality of individually identifiable records poses a greater challenge today than it did in the past for two reasons, Gates maintained. First, policy makers and researchers in education, health, and other fields are demanding detailed, individually linked data sets. Second, when such data sets are made public, people have access through the Internet to many more sources of data that could be used to identify individuals in the data set.

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA Options for Providing Access to Administrative Data Gates presented several options for providing safe, useful access to administrative data. He explained that it was important to consider the range of options, because no single option will meet the needs of every data producer and every data user. Options for Public use Without Restrictions Traditionally, the Census Bureau and other federal statistical agencies have made survey data and some types of administrative data available in the form of “public-use microdata systems,” and these data sets continue to meet the needs of most researchers today, Gates said. To protect these data against an intruder who might try to identify one or more individu- als, the public-use microdata systems include only a sample of records. For example, for the decennial census, the largest available national public-use microdata file includes only 6 percent of the population. In addition, bureau analysts remove all direct and indirect identifiers, restrict the amount of geographic information shown, remove outliers, and take other steps to reduce the risk of disclosure. The advantages of these data systems include availability to the public with no limits on use and easy analysis using most software. At the same time, public-use microdata sets have several limitations, due to the restrictions on geographic infor- mation, the removal of outliers—which are often the most interesting data—and other confidentiality protections. Gates explained that, for many administrative data sets, further pro- tections are needed to protect the confidentiality of individual informa- tion. One option is to develop synthetic data. Synthetic data sets have sev- eral advantages. They are designed specifically to protect administrative data, they can be made accessible to the public or to researchers without restrictions, and they are easy to analyze using most software. However, these data also have limitations. Because synthetic data sets are custom- ized to meet the needs of specific groups of users, they will not satisfy all researchers. In addition, research to date has not yet demonstrated the quality and usefulness of synthetic data for a wide range of different types of applications and analyses. Options for Restricted use Restricting use of administrative data sets provides another layer of protection, Gates explained. The Census Bureau pioneered one option to restrict use—the Research Data Center. After establishing the first center in Boston in the 1990s, the bureau expanded the network and today oper- ates nine such centers. At these centers, users may directly access adminis-

OCR for page 51
 PROTECTING STUDENT RECORDS trative data sets that include everything but direct identifiers (name, social security number). The data include outliers, and the user may link the data to external data sets. However, these centers also have limitations. The researcher or other user must go first go through a complex approval process and then relocate to the regional research data center, limiting possibilities for collaboration with other researchers. Another option for restricting access to administrative system data is through a licensing system, such as the system operated by the National Center for Education Statistics (see Chapter 2). This option has the advan- tage of allowing the licensed researcher to directly access data sets at her or his own institution, using the researcher’s own software, and facilitat- ing collaboration with colleagues at the institution. The limitations of this option include the possibility of losing one’s license if an onsite inspection by the licensing agency finds violations of the confidentiality protections or other elements of the licensing agreement. In addition, the license agreement typically does not allow the researcher to link the licensed data to external data sets. Remote Access Options Gates said that the option of providing researchers with remote access to statistical data is growing rapidly. In this option, the researcher submits programs to an intermediary, which applies the programs to restricted- use data and provides the results and tabulations to the researcher. For example, the National Opinion Research Center has created a “data enclave,” with data sets from the National Institute of Standards and Technology and other sources and conducts analyses of these data at the request of approved researchers.1 The National Center for Health Statistics provides access to approved researchers by e-mail through the Analytic Data Research by E-mail (ANDRE) system and has also created a virtual research data center.2 Remote access options have the advantage of providing easy access to administrative data sets by e-mail or the Inter- net. However, the types of analysis possible are limited by the specialized software housed on the servers of the data enclave. In addition, outliers in the data sets have been removed to protect against disclosure of indi- vidually identifiable information, and some data enclaves require users to pay a subscription. A final option for providing access to administrative data is to obtain informed written consent from the individuals whose administrative records are sought. Consent has the advantage of putting the individual 1 See http://www.norc.org/DataEnclave/. 2 See http://www.cdc.gov/nchs/r&d/rdc.htm.

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA in control and potentially allows the most flexible access to, and use of, individual record data. The limitations include potential bias in the data set, because some individuals will not give consent for use of their records, and it may not be possible to locate other individuals. When using this option, Gates said, it is critically important to test different formats for informing individuals of the proposed uses for their record data and requesting their consent. Gates concluded with several key points from his paper (Gates, 2008). First, administrative records serve many important research uses; these uses are supported by law and facilitated by the advanced technology and methods for linking records available today. Second, although all parties who have access to administrative records have a great incentive to pro- tect confidentiality, they are not all equally liable for any possible breach of confidentiality. Third, access is affected by increased public scrutiny of privacy protections, which leads to development of new data stewardship principles and policies. Gates explained that agencies sometimes apply new policies and procedures in response to privacy violations “in order to survive.” Finally, he observed that the variety of new options for access reflects the degree of control the agency holding the records is willing to relin- quish to permit specified uses. Gates emphasized that agencies will pro- vide access if they are comfortable that their requirements are going to be met, leading to a “staged approach,” rather than trying to provide access to all data for all users. Instead, the agency considers the needs of particular groups of data users, provides only the amounts and types of data needed, and imposes restrictions on use of these data. Responding to a question, Gates said that language is very impor- tant when talking about the complex issues involved in maintaining confidentiality of individual information. He said that the key point to convey, when talking to the public about using their personal records for statistical purposes, is that a federal agency will not use personal records to make administrative decisions, such as to determine one’s level of social security benefits, and that the information will be merged with information from other individuals. He said that the Census Bureau had done some research to find that, when people were asked to give consent for use of their information “for statistical purposes,” they did not understand the term. People were more comfortable when asked to give consent “for statistics,” because they think of statistics as numbers, Gates said. In response to another question, Gates agreed with Robert Boruch that studies of the informed consent process are important, stat- ing that “the statistical community needs to acknowledge that consent is an important issue.” Felice Levine highlighted a key question related to informed consent.

OCR for page 51
8 PROTECTING STUDENT RECORDS She said that, if researchers ask for informed consent when an individual takes a test or gives blood, they must consider what the consent is for. This question becomes important, she said, when a researcher asks an institutional review board for a waiver of consent. The board will look at the proposed research and consider whether the original consent—for its intended purposes—would be compromised by the research uses of the individual information. Levine suggested that it might “trivialize” the importance of informed consent if an individual were asked to simply check a box on a consent form, agreeing that the information could be used for all other legitimate research purposes. MODELS FOR ENSuRINg DATA ACCESS AND PRIVACY PROTECTIONS Myron Gutmann began by observing that the general issues sur- rounding data confidentiality are well reported in Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data (National Research Council, 2007: Chapter 2). Data confidentiality concerns reflect principles for the protection of human subjects outlined in the Belmont report (National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979), and they are supported in regulations by the Federal Policy for the Protection of Human Subjects (the Common Rule). There is an ethical consensus, Gutmann said, on the need both to protect human subjects and to share data. In this general context, Gutmann said that protection of educa- tion data is special because Family Educational Rights and Privacy Act (FERPA) legislation and regulations are added on to the core protections of the Common Rule. The Common Rule was designed to protect human subjects in a research environment, assumes that prior informed consent is fundamental, and uses a “reasonable” standard for protection. In con- trast, FERPA was designed to protect students and their families in an educational environment, does not assume prior informed consent, and has “a much more absolute standard for protection.” Despite these differ- ences in the law and regulations, he said, in reality, researchers do obtain informed consent for the use of school records, but they get this consent from school administrators, rather than from parents or students. The Problem: With a Focus on Spatial Data Gutmann illustrated the process of disclosure review (i.e., review of data sets to assess and prevent disclosure of individually identifiable information) using the analogy of trying to find Waldo in the children’s book, Where’s Waldo? (Handford, 1997). Gutmann said that, given five

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA attributes about Waldo, he can easily find him in a small crowd. In real- ity, administrative and survey data sets are large; by analogy, locating Waldo based on five attributes is much more difficult when he is in a large crowd. However, spatial information or electronic monitors make it easier to find Waldo even in a large crowd, because all explicit geographic locations are identifiable (VanWey et al., 2005). For example, if one knows that Waldo always stands next to the ring toss at the carnival, or if Waldo always carries around a radio transponder that signals his location, he stands out in the crowd. Gutmann explained that, when reviewing possible dissemination of data on a particular topic, it is important to recognize that these data may not be the only published source of information about that topic. For example, he might publish a map of Washtenaw County, Michigan, and list three attributes of an individual living in that county. This may pose little risk of revealing that individual’s identity, because there are many individuals with those three attributes across the entire county. However, because there may be only one individual with these attributes who lives in one particular city block, publishing this geographic information would be likely to reveal the individual’s identity. In the case of educa- tion records, if a few attributes of an individual student were published, along with that student’s school, then the individual student would be easily identifiable. Protecting Confidentiality: goals and Options Gutmann outlined two goals for protecting confidentiality when shar- ing or disseminating microdata: (1) to eliminate direct identifiers and (2) to eliminate unique individuals in small cells. Options for achieving these goals in tabular data with area identifiers but no precise spatial locations include the following: • ggregating values (e.g., into five-year age groups instead of single a years); • op-coding (recoding values so that extreme values are combined t with less extreme values); • swapping data across spatial units; and • aying careful attention to easily identified categories of data, p especially geography, schools, and clustered samples. Gutmann said that Putting People on the Map (National Research Council, 2007) differentiates between technical and institutional options for protecting confidentiality. Technical options include replacing real data with synthetic data for potentially identifying attributes and creating

OCR for page 51
0 PROTECTING STUDENT RECORDS secure data analysis systems. Institutional options, which focus on indi- viduals and organizations rather than on technology, include contracts and data enclaves. Institutional options vary, based on the perceived level of risk of disclosure; more complex options are more expensive and make research more difficult. For example, Gutmann said, the University of Michigan houses one of the nine research data centers operated by the Census Bureau. The center is expensive and difficult to use for investiga- tors who live more than 30 miles away. Gutmann presented a figure illustrating the gradient of levels of risk and levels of protection required for different types of data sets (see Figure 5-1). Simple data sets, with little risk of disclosure and little risk of harm if there were a disclosure, can be made publicly accessible on the Internet. If the data set is more complex and the risk of harm from disclosure is greater, then the data producer might require a formal data use agreement before providing access. Some data sets include information about illegal or undesirable behavior, which people voluntarily provide. Because dis- closure of individual identities in such data sets would pose a great risk of harm, the data producer might require a strong data use agreement. This agreement would be likely to include restrictions on technology and tech- FIguRE 5-1 The gradient of risk and restriction. SOURCE: Gutmann (2008).

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA nology access, similar to those included in National Center for Education Statistics licensing agreements (see Chapter 2). Finally, if there is a very high risk of disclosure and a high risk of harm, the data producer might “just lock things into an enclosed data center” and require the researcher to come and use the data onsite, Gutmann said. Inter-university Consortium for Political and Social Research Gutmann presented an overview of the activities of the Inter-University Consortium for Political and Social Research, the data archive he directs (see http://www.icpsr.umich.edu/). Experts at the archive conduct dis- closure risk analysis of all data sets that enter the archive. In most cases, they remove all direct and indirect identifiers in the data sets and make them available through the Internet to users who have signed contracts with the consortium. Because most of these data sets are based on politi- cal opinion polls and information on voter behavior, Gutmann described their information as “really not very harmful.” Other data, which are not useful for research without some indirect identifiers, are subject to further restrictions, ranging from easy to more stringent contracts. However, the consortium lacks authority to impose large fines on individuals who violate these contracts, in contrast to the National Center for Education Statistics (see Chapter 2). Finally, the most sensitive data sets, posing the greatest risk of harm to an individual whose identity might be disclosed, are housed in an onsite data enclave for use by researchers who appear in person. The consortium houses and maintains data sets and conducts research with support from many organizations, including the National Institute of Child Health and Human Development and other agencies in the U.S. Department of Health and Human Services, the National Institute of Justice, and the consortium’s member institutions. For example, the consortium is currently taking over dissemination of the National Lon- gitudinal Study of Adolescent Health (Add Health) from the University of North Carolina at Chapel Hill. This study includes several different types of data, including survey (self-report) data and biomarker data. Biomarker data—indicators of disease or health, such as blood pressure, heart rate, and the presence or absence of certain molecules—include both physical specimens and digital representations. The study also includes analysis of ancillary data, including participants’ high school transcripts. In compliance with FERPA, the investigators received individual written consent from each student in order to obtain access to the transcripts from their schools. To access these national data sets, a researcher must receive approval from the institutional review board at her or his home institution and provide a data security plan.

OCR for page 51
 PROTECTING STUDENT RECORDS boards at universities, hospitals, and other organizations implement the HIPAA Privacy Rule. Ness said that an institutional review board will generally provide a waiver of the informed consent requirement if it determines that the pro- posed research presents a low risk of disclosure of individually identifi- able information and that the proposed research could not be conducted without the waiver. She observed that there is “a great deal of local inter- pretation” of these two conditions. Survey Content Ness presented several news reports describing epidemiological and clinical research studies that were halted or slowed after the enactment of HIPAA. Describing the reports as “worrisome,” she noted that they provide only anecdotal evidence about the possible effects of the HIPAA Privacy Rule on health research. The Institute of Medicine (IOM) com- missioned a national survey in 2007 in order to make a more informed assessment of the effects of the rule, she said (Ness, 2007a). The IOM survey was conducted in collaboration with 13 epidemiol- ogy professional societies. Each society contacted all of its active mem- bers, requesting that they respond to an anonymous, web-based survey about the HIPAA Privacy Rule. A total of 2,805 individuals accessed the website. Among this group, 1,527 indicated that they had submitted a research proposal to an institutional review board since the enactment of HIPAA, and the answers of this smaller group were analyzed. Ness explained that the survey was designed to ask about positive and negative influences of the HIPAA Privacy Rule. First, the survey presented questions with quantitative responses, such as such as how frequently respondents collected various types of data, changes in the numbers of participants recruited before and after implementation of the Privacy Rule, and the level of difficulty encountered when seeking waivers or approval for release of deidentified data sets from the insti- tutional review board. A second group of questions focused on respon- dents’ perceptions of the ease and difficulty of conducting research under the HIPAA Privacy Rule and its impact on privacy and confidentiality, using a 5-point Likert scale. Third, the web-based survey presented five hypothetical research proposals, asking respondents whether their insti- tutional review board would approve them. Finally, respondents were asked open-ended qualitative questions, including a request for stories about HIPAA.

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA Analysis of Responses The respondents were predominantly women (59 percent), mostly employed in academia (66 percent), and they varied widely in age. In describing their perceptions, a large majority (84 percent) rated the degree to which the HIPAA Privacy Rule made research easier as low, at 1 to 2 on a 5-point Likert scale anchored at 1 = none. Responding to another question, a somewhat smaller majority (67.8 percent) rated the degree to which the Privacy Rule made research harder as high, at 4 to 5 on a 5-point Likert scale anchored at 5 = a great deal. Few respondents (10.5 percent) perceived the rule as having strengthened public trust a great deal, and only about one-quarter believed that the rule had greatly increased participant confidentiality. With respect to cost and delay, about half of the respondents perceived the rule as adding costs and delaying time to completion by a great deal. Ness was surprised that more respondents indicated that the rule had a negative effect on the protection of human subjects than the number of respondents indicating that the rule had a positive effect. Ness said this response was “almost bizarre on the face of it, because, of course, this is legislation that was purposely enacted to improve the protection of human subjects.” However, she went on to say that the respondents explained their responses in the qualitative section of the survey, indicat- ing that they viewed the burden of paperwork resulting from adding HIPAA to the Common Rule as so great that medical patients no longer understood what they were giving informed consent for. She said that about 15 percent of respondents indicated that, although their research proposals were approved by the institutional review board, the health care organization holding the medical records would not allow access because of Privacy Rule concerns. An additional 11.5 percent of epidemiologists surveyed had conceived of a study but not submitted it to an institutional review board because they thought they would be unable to obtain approval due to the Privacy Rule. More than half of the respondents said that an application they had submitted to an institu- tional review board was strongly adversely impacted by HIPAA. Presenting the responses to the case study section of the survey, in which respondents were asked whether their institutional review board would approve five different types of studies, Ness said that the key find- ing was the wide variability (see Table 5-1). Such wide variation indicates that institutional review boards are interpreting the Privacy Rule in very different ways, Ness said. In response to the final section of the survey, inviting HIPAA sto- ries, Ness said they received almost 500 written responses, reflecting the “angst that’s out there.” A total of 90 percent of the stories were negative,

OCR for page 51
 TABLE 5-1 Responses to Case Studies, Institute of Medicine Survey Would Your IRB Approve This Study?, Number (%) Yes, Yes, Yes, with Yes, Other No Unconditional with Waiver Approval Conditions Don’t Know Participants from 184 (12.56) 123 (8.5) 262 (18.0) 522 (35.9) 135 (9.3) 229 (15.7) medical records contacted for interview/blood draw Participants from cancer 196 (13.5) 157 (10.8) 261 (18.0) 468 (32.3)a 175 (12.1) 193 (13.3) registry contacted to consent for interview Tissue bank to supply 222 (15.6) 199 (14.0) 159 (11.2) 262 (18.4)b 291 (20.4)c 290 (20.4) deidentified data for assay not in original consent Medical record review 61 (4.7) 435 (33.8) 280 (21.8) 109 (8.5)d 121 (9.4) 281 (21.8) from subjects now dead Limited data set from 239 (20.2) 58 (4.9) 260 (21.2) 427 (36.0)e 123 (10.4) 317 (26.7) another hospital; research cannot be done without some identifiers NOTE: IRB = institutional review board. aWith physician approval. bWith authorization and reconsent from patients. cLimited data set or other special circumstances. dWith approval from executor of estate. eWith limited data set agreement. SOURCE: Ness (2007). Copyright 2007 American Medical Association. Reprinted with permission.

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA 5 percent were neutral, and 5 percent were positive. For example, one respondent wrote (Ness, 2007b): An already cumbersome patient consent form now has an additional page-and-a-half explaining HIPAA restrictions. This detracts from the informed consent process pertaining to the more critical issue: the actual medical risks and benefits of participating. In general, the written responses indicate that the HIPAA Privacy Rule had not stopped health research, but it had slowed research progress and increased costs. Many respondents expressed the view that the Privacy Rule was hurting public health surveillance and causing confusion in the public health community. Ness reminded the group that the HIPAA Pri- vacy Rule specifically permits disclosure of individual health information for public health surveillance. Summarizing the IOM survey results, Ness reiterated that only about one-quarter of the respondents believed that the Privacy Rule had enhanced privacy. Among all respondents, the rule was seen as having a more negative than positive impact on the protection of human subjects. The analysis of the responses suggests that institutional review boards around the nation interpret the rule in quite different ways, making it unclear whether many of the problems described in the survey are a func- tion of the Privacy Rule itself or local institutional review board interpre- tation of it. She said that the limitations of the survey include respondent bias; it may be that the respondents were those who feel most negatively about HIPAA. Another limitation is that it was not possible to calculate a response rate, because of the anonymous Internet process. In discussion, Ness explained that the primary audience for the sur- vey was the IOM committee, and that the survey reached a wider audi- ence through publication in the Journal of the American Medical Association (Ness, 2007a). She said that representatives of two agencies in the Depart- ment of Health and Human Services—the Office for Civil Rights, which leads implementation of the Privacy Rule, and the National Institutes of Health, which funds a large amount of health research—had attended the IOM committee’s meetings and were concerned about the wide variation in local interpretation of the rule. She observed that the IOM committee would decide how to respond to the problem in its final report. Stephen Plank (Johns Hopkins University) asked if the word “sur- veillance” raised public fears. Ness responded that the IOM commit- tee had commissioned a study of public attitudes toward privacy and health research (Westin, 2007), which found that language had a power- ful influence on individuals’ willingness to allow access to their health information.

OCR for page 51
8 PROTECTING STUDENT RECORDS A NEW APPROACH TO HEALTH DATA STEWARDSHIP P. Jon White (Agency for Healthcare Research and Quality) opened his remarks with the observation that health care in the United States has a quality problem. The Institute of Medicine (2000) found that between 50,000 and 100,000 deaths were caused each year by medical errors, and more recently, McGlynn (2003) found that health care recipients got the recommended level and type of care only about 55 percent of the time. At the same time, health care has a cost problem. The current annual expen- diture of $2.2 trillion represents a significant fraction of the nation’s gross domestic product, and health care costs are rising steeply, at an annual average rate of 6.5 percent. Current Efforts to Measure Health Care Quality White said that one proposed solution to both of these problems is to pay for quality, rather than paying for individual visits to the doctor or for individual procedures or treatments. The Agency for Healthcare Research and Quality is one of several organizations working toward this solution; these organizations all face the key question of how to set the health care quality goals that would guide payments. In one effort to answer this question, his agency awarded grants to support health care information technology systems for “enabling quality measurements.”4 In addition, the Department of Health and Human Services’ Center for Medicare and Medicaid Services funded six regional pilot projects to provide better quality information for Medicare beneficiaries. White provided an example to illustrate a key issue in using administrative—or in this case, electronic claims—data to assess health care quality. If one wanted to assess the quality of Dr. White’s treatment of diabetic patients, any single payer could provide claims data for only 10 to 15 percent of Dr. White’s diabetic patients. Doctors have successfully argued that measures based on their treatment of these small samples of patients are inaccurate. To address this problem, the Agency for Health- care Research and Quality and other organizations are beginning efforts to assemble health claims data from multiple payers, including the infor- mation assembled by the technology systems and regional pilot projects described above. The agency has also funded development of 14 “chartered valued exchanges” around the country. These are coalitions of health care provid- ers, payers, patients, and regulators who receive data from Medicare and from local payers and health care providers and try to use these data to 4 See http://grants.nih.gov/grants/guide/rfa-files/RFA-HS-07-002.html.

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA measure quality. White explained that the Centers for Disease Control and Prevention’s Biosense Health Surveillance Program5 taps into the existing streams of data in hospital information systems around the country and “sends it up to the mother ship in Atlanta.” Analysts there can monitor spikes in certain diseases, medical conditions, or symptoms. In addition to these federal efforts, health insurance companies and health main- tenance organizations are working to develop measures of health care quality. The New York attorney general’s office has signed agreements with several major health care organizations to rank doctors based on quality of care, rather than on how much they cost the organization (New York State Attorney General’s Office, 2007). Other organizations that are trying to assemble and analyze health care data to develop measures of quality include the National Quality Forum, a public–private partnership, Google, and Microsoft. All of these efforts face the question of who owns the health care data, White said. In the past, medical records were maintained in paper files, making it easier for any single doctor or hospital to own and keep them. With the change to digital records, it is possible for many individuals and organizations to own copies of health care records. White explained that he and his colleagues use the word “stewardship,” which he defined as “taking care of something that doesn’t belong to you.” He has been engaged in discussions of health data stewardship with many organiza- tions over the past few years, including the Ambulatory Quality Alliance, the National Committee on Vital and Health Statistics, which advises the Secretary of Health and Human Services, and the American Medical Informatics Association. These initiatives also face privacy and security issues, as individual health records are protected by HIPAA, the Common Rule, and state and local laws and policies. In 2005, the Agency for Healthcare Research and Quality helped to fund a collaborative effort among more than 30 states and territories to study their privacy laws and regulations governing medical records. As a result of these studies, the participants have begun working to harmonize these laws and regulations, both within and across states. A Data Stewardship Entity Returning to the concept of stewardship, White explained that the idea of assembling multiple sources of data in order to improve health care quality emerged several years ago in the Ambulatory Quality Alli- ance. The alliance includes representatives of White’s agency, two physi- 5 See http://www.cdc.gov/BioSense/.

OCR for page 51
0 PROTECTING STUDENT RECORDS cians’ organizations, and an association of health insurance companies. The alliance members recognize that doctors, laboratories, health insur- ance plans, and patients all have separate pieces of the health care infor- mation needed to measure quality. Through discussion, they developed principles for sharing and aggregation of these disparate sources of data, including (Ambulatory Quality Alliance, 2006): • transparency with respect to framework, process, and rules; • easurement of provider performance derived from standardized m metrics and data collection protocols that can be compared with national, regional, or other suitable benchmarks and otherwise assists in the analysis of assessments of health care quality and cost of care; • seful data for physicians to improve the quality and cost of care u they provide to their patients and other appropriate purposes (e.g., maintenance of certification); • ublic reporting to consumers of user-friendly, meaningful, and p actionable information about physician quality and cost of care; and • he collection of both public and private data so that physician t performance can be assessed as comprehensively as possible. White explained that, as the Ambulatory Quality Alliance members discussed these principles, they reached agreement on the need for a new health care data stewardship entity. When developing the mission and scope of the entity (Ambulatory Quality Alliance, 2006), they were unclear about whether the entity would simply set guidelines for assem- bling and managing data or would actually serve as a data archive. To solicit answers to this and other questions about the entity, the Agency for Healthcare Research and Quality published a request for information. Over 100 public and private health care organizations and individuals responded to the request, and the agency published a qualitative sum- mary of their comments (Agency for Healthcare Research and Quality, 2007). The varied responses included significant support for both possible roles of the entity: setting guidelines for data stewardship and acting as the data steward. At the same time, some respondents expressed signifi- cant concerns, and some were completely opposed to the idea of sharing their personal medical records. White said he found it very valuable to hear and understand these views from the public. Near the end of his presentation, White posed several questions that his agency and others are discussing as they consider the possibility of creating a health care stewardship entity:

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA • hat is your problem? What are you trying to address? What do W you need to do this for? • o you need a referee to address your problem by helping to set D and enforce the rules of the game? • o you need someone to hold the information you need to address D your problem? • ow do you avoid unintended consequences, including breaches H of privacy and confidentiality? White closed by warning that websites exist today, at which, for a small fee, an individual can enter a medical condition and receive a list of people who have that condition. The website managers gather the information from sources that are not governed by HIPAA. Finally, he said there are many questions and no conclusions. In discussion, White observed that lobbyists on Capitol Hill are telling Congress now that the patient ought to control her or his health records, although the doctor and the insurance company should also be allowed to access the records. White said that such proposals miss the possibility of using medical records for research that could improve the health care system for the public good. DISCuSSION: IMPLICATIONS FOR RESEARCH uSINg EDuCATION RECORDS Reflecting on Lizanne DeStefano’s earlier presentation about devel- opment of partnerships between researchers and schools (see Chapter 4), Miron Straf said that statistical agencies should work with administrative agencies, helping them to develop their data systems for statistical use. He asked whether there was a federal role in providing this type of assistance to education agencies. Supporting Research Partnerships Through Trust and Technical Assistance Marilyn Seastrom replied that 27 states currently have grants from the Department of Education’s Institute of Education Sciences to develop longitudinal databases of education records and that more funding will be provided to the states in fiscal year 2009 (see Chapter 2). One require- ment of these grants, she said, is that the states make the databases user- friendly and accessible to researchers. She said that, even if FERPA were changed to make data-sharing easier, states and school districts might still refuse researchers’ requests for data access if they lack the resources and technical capacity to do so.

OCR for page 51
 PROTECTING STUDENT RECORDS DeStefano responded that, in addition to trust, the motivation of research partners is also important. When a school or state education agency does not have a strong motivation to participate in a research project, she argued, the agency’s leaders are more likely to say that they cannot share data because of FERPA. Seastrom agreed that trust and the development of relationships would continue to be very important for researchers to gain access to education data. Paula Skedsvold (American Educational Research Association) asked whether legislative changes were needed in FERPA to clarify the mean- ing of research “for, or on behalf of” an education agency. She observed that, in response to the American Educational Research Association’s survey about FERPA, some respondents indicated that they had simply abandoned research projects, because they could not obtain access to the education records they needed. Seastrom asked workshop participants to describe how much state or local education agencies and researchers themselves alter education record data to protect confidentiality. Schneider replied that, in one case, her team had helped to deiden- tify a file of teacher information. She observed that researchers outside her team who wish to use the file are required to apply to their own institution’s institutional review board and to the state of Michigan’s institutional review board for the use of data, providing a data protection plan along with other information about the proposed research. DeStefano and others agreed that researchers should provide techni- cal assistance to state and local education agencies to increase their capac- ity in techniques of deidentification. Weighing Risks and Benefits of Disclosure and Research Martin Orland proposed that Gutmann’s matrix of risk and restric- tion (see Figure 5-1) should include another dimension—the likelihood of harm. Gutmann responded that the matrix included risk of harm and disclosure, and Orland replied that risk of harm and risk of disclosure should be two different dimensions. For example, he said, the risk of a nuclear power plant accident is minuscule, but this unlikely event could cause “enormous” harm. Orland expressed concern that the harm caused by even one disclosure of individually identifiable information could adversely affect the entire research environment, especially in light of the public and congressional concerns about privacy that had generated the HIPAA legislation. Gutmann agreed with Orland that it is important to differentiate between the risk of disclosure and the risk of harm. For example, he said, there is almost nothing on the short-form census questionnaire that

OCR for page 51
 RECONCILING FEDERAL STATISTICAL AND HEALTH DATA should cause an individual to be concerned if it were publicly revealed. In contrast, he said, other data that individuals provide in surveys or that are included about them in administrative records would pose great risk of harm if they were revealed, as shown in his graphic (see Figure 5-1). He went on to explain that he is much more worried about the potential harm disclosure could cause to groups of individuals than he is about the harm to researchers of limited access to data. He said that the people who fund his data consortium do so because they do not want to see a front page story in the news about any revelation of individual identities based on data they collect. Gerald Gates added that, unlike Gutmann, privacy laws do not distin- guish between more sensitive and less sensitive individual information; these laws simply state that individual information cannot be disclosed. Although institutional review boards in federal agencies consider the sensitivity of different data sets when determining how to protect them, they focus primarily on complying with the letter of the law by protecting against any disclosure of any individually identifiable information. Levine responded that one criticism of institutional review boards is that, when reviewing a research proposal, they fail to distinguish between the risk of disclosure and the magnitude of harm that a disclosure would cause. Gutmann responded that institutional review boards do not always take advantage of the flexibility they have to allocate their time and resources. For example, the University of Michigan institutional review board has explicit rules stating that research proposals to use data from a list of specific deidentified public data sources (including the Census Bureau, his institute, and other sources) do not require institutional review board approval (University of Michigan, 2008). Therefore, the board does not need to devote resources to reviewing these research proposals and can focus on other proposals in which protecting human subjects is more important. Gutmann suggested that the research community continue to work with institutional review boards to make sure that they are devoting their resources where they are most needed, especially because he sees most institutional review boards as “overwhelmed.” Seastrom responded that, while she agreed with Gutmann, the exact opposite would be the case for a disclosure review board. This type of board would be very concerned about what type of public information a researcher would add to a data set, she said, and Gutmann agreed.6 Straf said that it was important not only to distinguish between the risk of a potential disclosure and the harm that could be caused, but also 6 The University of Michigan (2008) policy states that a researcher who plans to merge more than one public data set and recognizes that this may increase the risk of identification of individual research participants should consult the institutional review board.

OCR for page 51
 PROTECTING STUDENT RECORDS between the risk of disclosure and the benefits of research. Gutmann responded that earlier workshop sessions had illustrated the benefits of using education records for research (see Chapter 3). DeStefano said that, in the partnership model, education agencies and researchers discuss the specific benefits of particular research projects, rather than considering the general benefits of research to society; she observed that the University of Illinois institutional review board had made note of these specific benefits when reviewing research partnership proposals. Ness said that the Institute of Medicine committee extensively dis- cussed the risks and benefits of research using individual health infor- mation. The committee commissioned surveys showing that the public is “hungry” for health information, and thinks that the United States should remain the world leader in generating new medical knowledge (Westin, 2007). She suggested placing the new knowledge resulting from research in “a very central position” when weighing research benefits and privacy risks. Levine agreed that the public increasingly recognizes the importance of health as a public good, saying that the public should view education in the same way. Boruch said that an early report on privacy and confidentiality by the Committee on National Statistics (National Research Council, 1979) included an analysis of how people react to a request for personal infor- mation presented in different ways. He suggested that the survey of pub- lic attitudes commissioned by the Institute of Medicine Committee might be a valuable resource for understanding how to frame such requests, which is a challenging task across fields of social science research; Ness said the survey is publicly available (Westin, 2007). Schneider urged the American Educational Research Association to continue providing professional development on keeping data confiden- tial. While acknowledging her fear that a breach of individual identity was inevitable, she said it was critical to educate the research community about confidentiality and how best to safeguard it.