approach that fits with technical statistical frameworks.25 Such trade-offs may be considered informally, but there are various formal sets of tools for their quantification.26

Duncan and Stokes apply such an approach to the choice of “topcoding” for income, that is, truncating the income scale at some maximum value.27 They illustrate trade-off choices for different values of topcoding in terms of risk (of reidentification through a specific form of record linkage) and utility (in terms of the inverse mean square error of estimation for the mean or a regression coefficient).

For some other approaches to agency confidentiality and data release in the European context, see Willenborg and de Waal.28

L.4.2
Record Linkage and Public Use Files

One activity that is highly developed in the context of statistical-agency data is record linkage. The original method that is still used in most approaches goes back to pioneering work by Fellegi and Sunter, who used formal probabilistic and statistical tools to decide on matches and nonmatches.29 Inherent in the method is the need to assess accuracy of matching and error rates associated with decision rules.30

The same ideas are used, with refinements, by the Census Bureau

25

For a discussion of the approaches to trade-offs, see the various chapters in Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J. Lane, J. Theeuwes, and L. Zayatz, eds., North-Holland Publishing Company, Amsterdam, 2001.

26

A framework is suggested in G.T. Duncan and D. Lambert, “Disclosure-limited data dissemination (with discussion),” Journal of the American Statistical Association 81:10-28, 1986. See additional discussion of the risk-utility trade-off by G.T. Duncan, S.E. Fienberg, R. Krishnan, R. Padman, and S.F. Roehrig, “Disclosure limitation methods and information loss for tabular data,” pp. 135-166 in Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J. Lane, J. Theeuwes, and L. Zayatz, eds., North-Holland Publishing Company, Amsterdam, 2001. A full decision-theoretic framework is developed in M. Trottini and S.E. Fienberg, “Modelling user uncertainty for disclosure risk and data utility,” International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems 10(5):511-528, 2002; and M. Trottini, “A decision-theoretic approach to data disclosure problems,” Research in Official Statistics 4(1):7-22, 2001.

27

G.T. Duncan and S.L. Stokes, “Disclosure risk vs. data utility: The R-U confidentiality map as applied to topcoding,” Chance 3(3):16-20, 2004.

28

L. Willenborg and T. de Waal, Elements of Statistical Disclosure Control, Springer-Verlag Inc., New York, N.Y., 2001.

29

I. Fellegi and A. Sunter, “A theory for record linkage,” Journal of the American Statistical Association 64:1183-1210, 1969.

30

See, for example, W. Winkler, The State of Record Linkage and Current Research Problems, Statistical Research Report Series, No. RR99/04, U.S. Census Bureau, Washington, D.C., 1999; W.E. Winkler, “Re-identification methods for masked microdata,” pp. 216-230 in Privacy in Statistical Databases, J. Domingo-Ferrer, ed., Springer, New York, N.Y., 2004; M. Bilenko,



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement