The ability to perform such statistical identification has a significant impact on medical research that mines historical data. Researchers in this area are generally unable to obtain informed consent from those whose records are being used because of the large sample sizes that are mined in such studies. Often many of the subjects are unavailable to provide such consent, either because they are deceased or because the contact information in the record is out of date. Without such consent, both the ethics of the profession and current federal privacy regulations mandate that the information be rendered anonymous.
There are technologies for anonymization that have been developed for statistical disclosure limitation. As noted in Chapter 3, the core concept behind such technology is to randomly scramble the information in complex records in such a way as to make it impractical to correlate an individual record and a particular person while maintaining the statistical relationships between those parts of the record being analyzed. However, such technologies can often mask just the kinds of relationships that medical research is trying to discover. When the information to be correlated is known before the anonymization occurs, such techniques are often valuable. However, often these studies are an attempt to discover correlations that are not known before examining the data. In such cases, de-identification can mask the very correlations that are the goal of the study.
Part of the problem with the notion of anonymization of records is that the regulations regarding the use of anonymized information treat the notion as a binary relation—either the record has been anonymized, or it is individually identifiable information. However, since much of the information is such that it lends itself to statistical correlation, the notion of anonymization is more accurately represented as a probability that the collection of information can be used to identify an individual out of a target population at an affordable cost. If the probability must be zero, much of the wealth of medical information that is available for long-term statistical study will be far more difficult to obtain or use in such research.
A further confusion is that guidelines and regulations often speak of “de-identified” information even though a close reading suggests that they mean anonymized (i.e., information for which re-identification is for practical purposes impossible).