The ability to perform such statistical identification has a significant impact on medical research that mines historical data. Researchers in this area are generally unable to obtain informed consent from those whose records are being used because of the large sample sizes that are mined in such studies. Often many of the subjects are unavailable to provide such consent, either because they are deceased or because the contact information in the record is out of date. Without such consent, both the ethics of the profession and current federal privacy regulations mandate that the information be rendered anonymous.

There are technologies for anonymization that have been developed for statistical disclosure limitation. As noted in Chapter 3, the core concept behind such technology is to randomly scramble the information in complex records in such a way as to make it impractical to correlate an individual record and a particular person while maintaining the statistical relationships between those parts of the record being analyzed. However, such technologies can often mask just the kinds of relationships that medical research is trying to discover. When the information to be correlated is known before the anonymization occurs, such techniques are often valuable. However, often these studies are an attempt to discover correlations that are not known before examining the data. In such cases, de-identification can mask the very correlations that are the goal of the study.

Part of the problem with the notion of anonymization of records is that the regulations regarding the use of anonymized information treat the notion as a binary relation—either the record has been anonymized, or it is individually identifiable information. However, since much of the information is such that it lends itself to statistical correlation, the notion of anonymization is more accurately represented as a probability that the collection of information can be used to identify an individual out of a target population at an affordable cost. If the probability must be zero, much of the wealth of medical information that is available for long-term statistical study will be far more difficult to obtain or use in such research.

A further confusion is that guidelines and regulations often speak of “de-identified” information even though a close reading suggests that they mean anonymized (i.e., information for which re-identification is for practical purposes impossible).

stored and transmitted by those entities covered by the HIPAA law. These regulations became final in 2002, and their phased introduction began in April 2003.

Like the policy set forward by the AMA Ethical Force program, the privacy regulation that is part of HIPAA is based on the principle of informed consent. With certain statutory exceptions (such as use of information for the purposes of treatment, payment, or health care operations, or for law enforcement or research purposes), consent of the individual must be obtained for all uses and disclosures of personally identifiable health information. In addition, the HIPAA privacy regulations require that all covered entities (a category that includes all government health plans, private sector health plans and managed care organizations, health care providers who submit claims for reimbursement and payment clear-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement