precise combination of values of the sociodemographic variables might identify the subject’s geographical area and thus pose a risk of disclosure of confidential information about individual plan members. Methods have been developed for masking such data by rounding and/or adding random noise. Such masked data sets can be analyzed with appropriate corrections for the effects of masking. But development of the specific procedures and parameters required to implement data masking requires particular statistical expertise that is not likely to be found within health insurers. Considerable resources would be required to accomplish it. Furthermore, a uniform procedure should be followed so that data will be comparable across the private-sector units generating the data.
DHHS could greatly facilitate the routine generation of high-quality, uniform, and nondisclosing geographically linked data sets by providing a linking service that could be used by private- and public-sector health care organizations. Such a service could be administered, for example, through a Web site. An organization would anonymously submit a file containing member addresses and would receive in return a file of masked geographical variables at several levels. Although geocoding is an imperfect process, typically 85 percent of addresses in a health care file might be geocoded down to the block group level; cases that cannot be geocoded might be either imputed or analyzed using variables aggregated to higher levels of geography.
The greatest expertise in the federal government for solving the problems involved in establishing such a service resides in the Bureau of the Census. Within DHHS, the NCHS has been a leader in dealing with confidentiality issues. Alternatively, a private-sector vendor with the necessary geocoding expertise could be recruited, although such vendors do not typically deal with the related confidentiality issues.
RECOMMENDATION 6-3: DHHS should establish a service that would geocode and link addresses of patients or health plan members to census data, with suitable protections of privacy, and make this service available to facilitate development of geographically linked analytic data sets.