Skip to main content

Currently Skimming:

4 Issues and Challenges Associated with Data Sharing
Pages 39-68

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 39...
... "There is an inherent tension between collaborative scientific data sharing and what often can be adversarial science," he said. "The challenge here is that investigators invested time.
From page 40...
... "The original investigators do not want to talk to them." With the sharing of data encumbered in this way and the usual open scientific dialogue closed down somewhat, Greenbaum explained that it becomes much harder to advance the science in these areas -- not just replicating the initial work but also extending it and carrying out new analyses on the data sets. All of these valuable outcomes of data sharing become the victims of distrust and suspicions.
From page 41...
... I do not think I knew it was a $250,000 fine [for failing to protect any personal information] when I signed the data use agreement for the Harvard Six Cities Study and the American Cancer Study, but I knew it was significant penalties, and we pay attention to it.
From page 42...
... Business Considerations Related to Data Sharing Businesses must take into account various issues when deciding whether to share their data. Two of the main ones are concerns about opening themselves up to liability and other costs and worries about losing the value of confidential business information.
From page 43...
... He pointed out that a document from the Oak Ridge National Laboratory on best practices for preparing environmental data sets to be shared and archived has excellent detail on what investigators can be doing to prepare to publicly release the data as they develop them.1 The Business Value of Data During Session 3 of the workshop, Greg Bond from the Dow Masters Fellowship Program at the University of Michigan spoke about some of the other business considerations related to data sharing. He noted that any raw data that industry generates to support a product registration are accessible to the relevant government authorities under a variety of statutes in the United States and also abroad.
From page 44...
... As a result, companies often treat environmental data as confidential business information and try to protect them from their competitors. The preservation of such confidential business information can come into conflict with the imperative to share research data, and finding the proper balance between the need to keep business information confidential and the desire to share scientific data can be difficult.
From page 45...
... Beyond that, there may be a time limit beyond which the company can keep the data, particularly toxicology data, confidential, and beyond that time limit the data are available. However, he added, he could not think of a single instance when someone asked for access to the data underlying a particular toxicology study.
From page 46...
... He noted that multiple people talked about the need for definitions during the workshop, but from the point of view of someone who is in charge of the government asset of data, the question really is "where are the data? " Then decision makers may ask, how does one make the data publicly available, are the data readable by today's machines, how should archived data be handled, and how should one plan for future data requests 20 or 30 years from now.
From page 47...
... For example, she indicated, the institute would like to share its data set through EPA's ExpoCast, an online data resource, but there are questions around whether the data may be reidentifiable, especially when linking air pollution data and personal information, such as household characteristics and consumer product purchases. These questions led Brody and her colleagues to establish a partnership, with funding from the National Institute of Environmental Health Sciences (NIEHS)
From page 48...
... And three, what can and should we promise to our study participants in the informed consent? " A key question is how likely is reidentification of subjects in existing data sets that have had the obvious personal identifying information removed.
From page 49...
... It helps drive forward innumerable scientific and health research advances. It greatly benefits our society as a whole and yet still provides strong privacy protections for individuals." "As we move towards expanded health information technology and electronic medical records," he concluded, "it will yield even more deidentified clinical data, which I believe will support important advances in health science." "The inconvenient truth is that we are stuck with a trade-off," he said.
From page 50...
... The primmary goal of thhe law is to maake it easier foor people to keeep health insurance, prrotect the co onfidentiality and security of health caare informaation and help p the health care industry conntrol administtrative costs. T The HIPAA A Privacy Rulee provides fed deral protectionns for individuually identifiabble health information held h by covereed entities andd their businesss associates aand gives patients p an arraay of rights with w respect to that informatiion.
From page 51...
... There are various approaches to deidentifying data, each with its own advantages and disadvantages, Barth-Jones noted. For example, the "expert determination" method is a little bit more flexible than the "safe harbor" approach.4 "It helps us balance the competing goals of privacy protection and preserving the utility and statistical accuracy of deidentified data," he said.
From page 52...
... "If they are unique in the sample, we call them ‘sample unique.' If they are unique in the larger population, we call them ‘population unique.'" In general, in any given sample there may be individuals with unique identifying information; for example, there may be only one person in the sample with a particular age, sex, location, and degree of education, but in the larger population there may be several people with that particular set of identifiers. "It is really only those records that are unique
From page 53...
... "If we aare going to do this to our statisticaal analyses," he h said, "we m might as well all give up and a go hom me." Baarth-Jones notted that it is important to oobtain a betterr understandinng of the risks that deeidentified pattient data cann be reidentiffied to improove patientt confidentiaality. "How do we movve beyond aanecdotes to a
From page 54...
... Without real-world demonstrations of data reidentification that indicate actual vulnerabilities and risks, there will be bad policy and little scientific progress in privacy-enhancing technologies. To zero in on the appropriate types of privacy protections -- whether they are stronger than today's protections, weaker, or, more likely, totally different from what is now known -- it is necessary to go through cycles of data being reidentified, protections being changed in response, and so on.
From page 55...
... "It is cited in the HIPAA privacy regulation preamble," she said. "It is cited in preambles in other countries' privacy regulations." But although she has made more than 20 attempts to publish a paper describing the details, it has never been published.
From page 56...
... Your nicknames are in it. Your cell phones are in it." From that publicly available information she was able to get zip codes for individuals, which she linked with the hospital data to identify people who had been discharged from the hospital.
From page 57...
... There is so much information floating f arounnd that is readdily available to anyone whho knowss where to loo ok that it is very v difficult to get a handdle on the exaact o reidentificaation. But thee first step in understandinng those risks is risks of to get a clear pictuure of exactly y what inform mation is avaailable and hoow the diffferent sourcees of informaation can be ccombined to identify peopple who ap ppear in data sets that are supposed s to bbe deidentified.
From page 58...
... The data for the Six Cities Study were of three types: some data consisted of basic information concerning the individuals and their health histories, some data described the subjects' respiratory systems when they were admitted to hospitals and where they were living at the time, and the third type of data consisted of death records. The location data were important, Casey noted.
From page 59...
... The data from the study are "foundational information" used in setting clean air rules, Casey said. "And the Clean Air Act states that it should be revised periodically, and, when it is, it should draw upon the published research that is available, and when they do, it has directed attention to the Harvard School of Public Health.
From page 60...
... If there were a way that they could thread that needle and provide the amount of information to satisfy reasonable policy makers in the pledge that they made to their subjects, they would do it tomorrow." COMMENTS AND DISCUSSION This section summarizes the discussions on the challenges associated with data sharing that took place throughout the workshop. Issues Associated with Reanalysis of Data A workshop participant watching the workshop over the Web noted during the discussion after Session 2 of the workshop that one of the concerns that researchers have about sharing their data is that whoever does a reanalysis of the data may not do it to particularly high standards and come up with results that are incorrect.
From page 61...
... Informed Consent During the discussion after Session 2 of the workshop, Alan Morrison of the George Washington University Law School highlighted two basic issues with informed consent as it concerns environmental health data. "The real problem today," he said, "is that when you ask somebody for consent, particularly a broad consent, neither you, the requestor, nor the person being requested has any notion at all as to what that means in terms of how it is going to be used because we do not even know what it is going to be used for down the road.
From page 62...
... In the opening remarks for Session 3 of the workshop, Glenn Paulson, science adviser in the Office of the Administrator at EPA, said that there is a new working group on modernizing the Common Rule, the federal rule governing the treatment of research subjects in many federal departments. Changes in the Common Rule could affect such issues as informed consent and the sharing of data from surveys of trials involving human subjects.
From page 63...
... " This might come up, for example, if a plaintiff in a lawsuit claiming harms from environmental exposures of concern had taken part in an environmental health study. The opposing lawyer might search through the data set from the study, looking for additional information on the plaintiff, which would be accessible if it was possible to identify the plaintiff from among the people who took part in the study.
From page 64...
... study but potentially downstream in other kinds of clinical trials and other areas where things could become controversial." This exploitation should have its limits, though, warned Casey. For example, the data in the Harvard Six Cities Study have been, if anything, overanalyzed in the two decades since the study appeared.
From page 65...
... The consent forms generally said that information about the subjects would be kept private "unless it is requested by a court of law or something to that effect," he said, and this language did not seem to be a reason for people not being willing to join research studies. Lynn Goldman, dean of the Milken Institute School of Public Health at George Washington University, spoke of the repercussions of a ruling to publicize data that a researcher had promised to keep private.
From page 66...
... "This may not be as much of a problem in some of the very large, statistically based, almost ecological kinds of epidemiological studies," she said, "but it may be extremely important when we are dealing with the smaller studies where we have small cohorts that we recruit and we want to follow." Explaining Risks Better to Subjects Howard commented during the discussion after Session 3 of the workshop that it is no longer possible to believe -- as it was several decades ago -- that the confidentiality of research subjects can be absolutely assured and that researchers thus have an obligation to talk about confidentiality risks differently than they did many years ago. "It would seem to me," he said, "that we are at a stage of developing obligation on the part of scientists [where]
From page 67...
... 2011. A systematic review of re-identification attacks on health data.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.