Skip to main content

Currently Skimming:

6 Operationalizing Transparency
Pages 49-72

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 49...
... His colleague at the Bureau, Ruth Ann Killion, would follow with a parallel focus on engagement and transparency working with the agency's external partners. Eltinge said his questions would relate to three different types of transparency: (1)
From page 50...
... Another participant said that he was very impressed with what everyone heard about the UK's Office of National Statistics and Statistics Canada. At BEA, he said, the agency is getting closer to what they are calling a unified system that is interconnected and internally transparent.
From page 51...
... He noted that this issue is separate from the various data access issues. Sarah Henry said that when acquiring the data, it is important to think about what conversations one will be having with the agencies that are providing the data.
From page 52...
... Killion said that one has to plan for these conversations to take place. Eric Rancourt said that at Statistics Canada, there is a mechanism for the acquisition of administrative data.
From page 53...
... In a 2013 review, the Bureau discovered that the knowledge of someone's participation in a survey increased the probability that the person could be found in one of the microdata files. Killion added that another method used at the Bureau is to allow researchers to obtain special sworn status as a Bureau employee, which provides access to data dependent on the requirement that the user protect confidentiality and privacy.
From page 54...
... These situations limit transparency, she said. Killion noted again that disclosure avoidance techniques intentionally perturb the data.
From page 55...
... Identifying and dealing with outliers is absolutely necessary to get good estimates, she acknowledged, but she suggested that the Bureau does not necessarily have to do all of the microdata clean-up that those ad hoc editors do. More timely, relevant data would be essential, she said, so it would be useful if the Bureau provided repeatable editing practices that can be automated, thereby much quicker.
From page 56...
... Second, it is very common to use random-number generators in editing and imputation routines, but this approach reduces the ability to reproduce outputs because one is using random terms. Finally, she said, disclosure avoidance techniques perturb the data and therefore also reduce the possibility of reproducibility.
From page 57...
... Abowd said that the paper also shows how to recover some of those parameters without using any private information from datasets that had a similar frame but used different confidentiality protection mechanisms. Abowd added that he and Schmutte are certainly not the first or the last people to document the failures of ad hoc statistical disclosure limitation.
From page 58...
... Bill Eddy said that there is just no getting around the issue that the executives at statistical agencies have to be educated about how to manage a privacy loss budget. If that is not done, he said, some computer science class someplace is going to take one of the publications and reproduce the microdata.
From page 59...
... She said that BEA has a small slice of those issues because the agency has its own survey program to collect data on services, trade, and foreign investment. She complimented Killion for her presentation but quibbled with her about ad hoc editing versus auto­ matic editing.
From page 60...
... A participant noted that the former head of the Australian Bureau of Statistics launched a program based on a similar idea. It is now known as the High-Level Group for the Modernisation of Official Statistics, and its goal is to make sure that not all organizations build systems for all of the necessary data treatment steps.
From page 61...
... Mockus offered a more detailed explanation of version control, which is essentially keeping track of changes to documents, data, or products. That is, it is not enough just to keep track of the current methods; instead, one can document each change.
From page 62...
... Mockus said that another benefit of version control is that it is admin istrative data. If one has information about the code, one has all versions of the code and all versions of the data.
From page 63...
... started by asserting that in the social and statistical sciences, ­ eplicability of research is an increasingly required part of research output, r but confidential data, such as those curated by statistical agencies, present problems. The replicability of research using proprietary data is perceived as problematic at best, impossible at worst.
From page 64...
... ­ Vilhuber acknowledged that reproducibility with confidential data is hard. However, he argued that data that are held by federal or national statistical offices can alleviate some of the concerns.
From page 65...
... all programs are stored in welldefined locations. As it turns out, in many national statistical offices, this information is already being collected in a disclosure review request (DRR)
From page 66...
... The data about the application and release process are, in fact, administrative data and are currently being collected. The national statistical agencies collect and curate all of the elements of a DRR of a research data center (RDC)
From page 67...
... For example, one has a result dataset ­ and also has the computational workflow that resulted in this dataset. At the dataset-item level, one might have provenance regarding particular data in the result dataset, which indicates how that individual item was derived and on what source data items it depends.
From page 68...
... Regarding national statistics, consumers cannot (often) be given access to source data due to privacy concerns.
From page 69...
... The same participant continued that some of this reproducibility testing takes place because they have or can bring the data inhouse. This raises the general question in each country as to where this activity can be outsourced.
From page 70...
... The field is clearly at a point at which there is a crisis around social statistics and analysis using social statistics. Being ­ transparent will not solve that problem; transparency experts acknowledge ­ that it gives statistical agencies some credibility and legitimacy that they do not have if it seems like they are hiding things.
From page 71...
... OPERATIONALIZING TRANSPARENCY 71 Finally, the discussion returned to the last couple of points that Vilhuber presented about the idea that the restricted data systems provide ­ a useful model for how to make research results reproducible. It is possible to share the code and the metadata and the actual data, but one of the problems with the reproducibility of research results in academia is that users are ­ ften taking data out of context and attaching them to a research o article.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.