Skip to main content

Currently Skimming:

Summary
Pages 1-14

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 1...
... At the same time, SIPP collects so much data about multiple people in a household that there is a heightened possibility, relative to most surveys, that those data can be combined in a way that will uniquely identify an individual. For example, the SIPP 2020 data include a household in Florida with a Black male born in 1946 married to, and sharing the household with, an Asian female born in 1941, also with a child born in 1968 in the household; this combination of characteristics is sufficient to make the household unique 1
From page 2...
... As part of its fact gathering, the panel will consider: • the evolving privacy risks to releasing survey data; • developments in methods for protecting privacy and reducing risks of disclosure, including formal privacy methods being implemented at the Census Bureau; • the dimensionality and longitudinal nature of SIPP data; • the linking of SIPP data with administrative data; • existing SIPP data products and the utility of detailed public-use microdata that enable scientific discovery; • selected other SIPP data products, such as a small area estimates program for key SIPP measures; and • the need for protecting the confidentiality of SIPP data, potentially across multiple data releases, while providing timely access for the many re search uses of SIPP. The panel will produce a report with conclusions and recommendations for disclosure protection and data provision from the SIPP program.
From page 3...
... ? To address these four questions the panel conducted literature reviews; invited outside experts to provide information on SIPP, disclosure avoidance approaches, and data ethics; examined the 2020 SIPP public-use file and its documentation; and conducted a data collection asking SIPP data users to complete a short online questionnaire about their experiences in working with SIPP data.
From page 4...
... The actual risk of disclosure in SIPP data products has long been relatively unknown. The panel commends the Census Bureau on recently conducting a re-identification study to measure the likelihood that a data intruder could identify survey respondents by matching data in the SIPP public-use file with data contained in income tax data and Social Security records.
From page 5...
... variables or to produce a table generator that restricts what tables can be produced and that adds noise to produce differentially private output, but the large number of variables and the presence of multiple observations over time both are complicating factors that could not be addressed at a full-file level in a reasonable amount of time. A different approach is to change access to the data.
From page 6...
... , providing access to only a small subset of SIPP data plus selected administrative data, all synthesized. Other types of access used by federal agencies include a restricted-use file (provided on CD or by download, with restrictions on how the data may be stored, analyzed, and reported)
From page 7...
... Contingent on the findings from future re-identification studies, it seems likely that some variables (including demographic and geography variables) may need to be given statistical disclosure treatment, such as by collapsing to fewer categories, synthesizing, or developing alternative means of access to the variables or their analytical use (Recommendation 4-1)
From page 8...
... provide a quicker and simpler process for disclosure review of findings for public release. A SODA would function by creating a controlled environment for data access and especially data release, and by requiring signed user agreements to gain access.
From page 9...
... Table generators can be designed to infuse noise and to limit the production of data involving very few respondents in a cell, so per-user disclosure review may be unnecessary. Steps That Will Help in Balancing Usability and Confidentiality Particularly to the extent that new disclosure avoidance procedures are adopted, greatly enhanced communication would be beneficial.
From page 10...
... This may be a time to involve outside organizations for tasks such as maintaining SODA and assisting with the disclosure review process for publishing data. The Census Bureau could also benefit by including external researchers when performing re-identification studies and developing disclosure avoidance strategies, bringing in additional skills and perspectives to supplement the work of Census Bureau staff.
From page 11...
... After the disclosure risks are known, a variety of approaches are possible, depending on the level of risk and the planned tier of access. These range from traditional statistical disclosure limitations, such as those currently used by SIPP, to synthetic data (also used by SIPP for a small extract of the file combined with administrative data)
From page 12...
... 12 FIGURE S-1a  Stages of disclosure avoidance.
From page 13...
... While the various data disclosure approaches conceptually could be applied at any of the tiers of access, methods of changing or suppressing the data are most applicable to a table generator or public-use file, while the additional controls placed by SODA are intended to allow more complete access to the original data. However, one might still use synthetic data at an FSRDC, not necessarily to protect SIPP data but rather to allow the merging of other data, possibly to protect the other data or possibly to allow the blending of data that are not directly matchable.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.