to match persons in the Current Population Survey (sample size, about 60,000 households) with IRS returns. The Census Bureau and the IRS provide the data to a group that links the records to produce a set of files that contain information from both sources. The merged files are redacted, and noise is added until neither the Census Bureau nor the IRS can rematch the linked files with their original files.31 The data are released as a form of PUMS file. Those who prepared the PUMS file have done sufficient testing to offer specific guarantees regarding the protection of individuals whose data went into the preparation of the file. This example illustrates not only the complexity of data protection associated with record linkage but the likely lack of utility of statistical-agency data for terrorism prevention, because linked files cannot be matched to individuals.

R. Mooney, W.W. Cohen, P. Ravikumar, and S.E. Fienberg, “Adaptive name-matching in information integration,” IEEE Intelligent Systems 18(5):16-23, 2003.


For more details, see J.J. Kim and W.E. Winkler, “Masking microdata files,” pp. 114-119 in Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, Va., 1995; J.J. Kim and W.E. Winkler, Masking Microdata Files, Statistical Research Report Series, No. RR97-3, U.S. Bureau of the Census, Washington, D.C., 1997.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement