Sometimes there are legal limitations on the use of data linkages. For example, employers are legally forbidden to link some claims data from their employees’ health insurance records to employer-based records without protection of the employee’s privacy. The Health Insurance Portability and Accountability Act (HIPAA) regulations require removal of identifiers from publicly released data, although exceptions can be authorized (with appropriate safeguards) when required for research. The use of social security earnings data by researchers outside the agency is severely limited.
A further barrier to data linkages is the need for negotiation across agencies and entities that maintain the data, which might have varying confidentiality provisions. Thus, linkages across agencies may require complex interagency negotiations.
Several methods are used to guard against harmful uses of linked data and to protect confidentiality. Masking and deidentification are two procedures that maintain the integrity of an individual’s data but strip any personally identifying information from the linked record. The National Center for Health Statistics (NCHS), the Agency for Healthcare Research and Quality (AHRQ), and the Census Bureau all maintain restricted-access data centers that are housed in a secure setting but make the data available to researchers with proper credentialing and assurances of nondisclosure. These techniques facilitate the use of linked data. (See NRC, 2000 and NRC, 1993 for more extensive discussions of these methods.)
Sometimes when it is impossible to link data on individuals from two or more data sets, individual data from one set are linked to geocoded area-based measures from another set of data, which serve as a proxy for individual measures. As mentioned previously in this chapter, geocoding and the use of area-based measures are not perfect proxies for an individual-level variable. Area-based measures both at the Zip Code and census tract level are not as precise as individual-level data (Geronimus and Bound, 1998). But some area-based measures have been found to be better than others for health outcomes models: for example, aggregate income, education, and occupation were better predictors of health outcomes than socioeconomic index measures (Geronimus and Bound, 1998). This study also found that census tract-level measures are not significantly better than Zip Code-level measures. Krieger (1992) found that block group measures of SEP performed better than census tract measures of SEP for some health outcomes, but that the opposite was true for others. In a more recent study that examined many different health outcomes (e.g., birth and death outcomes, incidence of cancer and other diseases, and homicide), Krieger et al. (2003) found that census tract- and census block-level measures of SEP gave consistent parameter estimates of the effects of these SEP measures on outcomes across different racial, ethnic, and gender groups, while Zip Code-level measures were less consistent. This study also found that the percent