National Academies Press: OpenBook
« Previous: Summary
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

1

Introduction

The principal U.S. statistical agencies1 are tasked with informing the public on various aspects of the state of the country and its residents, including such characteristics as the population count of states, counties, and cities, energy consumption, farm production, the state of the economy, educational attainment, and employment. Often, these official estimates are used to support decisions about how governmental policies should be implemented or modified to improve various dimensions of the nation’s welfare. Further, official statistics are used to allocate political power and to distribute federal funds. To provide information on these important matters, the statistical agencies produce official statistics on the status of these various aspects of the United States.

These estimates must be seen as trustworthy. One step toward achieving trust is to maintain an “open book” policy. While all the federal statistical agencies already agree with having such a policy, in principle, it

___________________

1 There are 13 principal statistical agencies: the Bureau of the Census and the Bureau of Economic Analysis in the Department of Commerce; the Bureau of Justice Statistics in the Department of Justice; the Bureau of Labor Statistics in the Department of Labor; the Bureau of Transportation Statistics in the Department of Transportation; the Economic Research Service and the National Agricultural Statistical Service in the Department of Agriculture; the Energy Information Administration in the Department of Energy; the National Center for Education Statistics in the Department of Education; the National Center for Science and Engineering Statistics in the National Science Foundation; the National Center for Health Statistics and the Office of Research, Evaluation, and Statistics of the Social Security Administration in the Department of Health and Human Services; and the Statistics of Income Division of the Internal Revenue Service in the Department of the Treasury. In addition to these 13 agencies, a number of other agencies have smaller units that have major statistical responsibilities.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

is often unclear to them how it should be put into operation, in particular how much detail should be provided. Nevertheless, it seems clear that, at the least, sufficient information must be made available so that it is clear that official statistics are meeting high standards of scientific quality and integrity.

Two concepts that have been applied to scientific arenas where openness is warranted are those of transparency and reproducibility. In this context, transparency and reproducibility are to be achieved through release of documentation on the plans, processes, datasets, computations, and estimation methodologies used, both through release of the official estimates and through release of the evaluation of the input data and the official estimates. The increased accountability that would result from such openness can assure the public that the data were collected without bias and that the methods used are consistent with the current state of the art of statistical science. In a sense, there is an implied contract. The data are collected with funding from taxpayers, and statistical agencies ask for cooperation in their collection of that data, and in response the agencies return high-quality, objective official estimates that enable the public to make informed decisions. Transparency is necessary for the public to determine whether the statistical agencies are keeping their side of the bargain.

As indicated, these questions about transparency and reproducibility in official statistics are part of a much broader set of questions about transparency and reproducibility in science. These questions resulted in the National Academies of Sciences, Engineering, and Medicine undertaking a series of forums on open science, whose goal was to elicit from participants a variety of ideas on ways to support greater openness in a wide range of scientific enterprises. The last activity in that series was a workshop with a narrower focus: openness concerning the generation and publication of official statistics. As a result of the discussions at that workshop (see Methods to Foster Transparency and Reproducibility of Federal Statistics, NASEM, 2019a), one attendee, John Gawalt, then director of the National Center for Science and Engineering Statistics (NCSES), thought it would be valuable to have a study examining the extent to which NCSES practices, and those of federal statistical agencies more broadly, are currently transparent, what benefits greater transparency might offer, what tools might facilitate greater degrees of transparency, what are the legal, administrative, and resource-based constraints on being more transparent, and, if considered desirable, what are the appropriate steps to increase the degree of transparency both in the near term and over time. His successor, Emilda Rivers, was instrumental in bringing this to fruition.

This interest in transparency and reproducibility in official statistics is not new—it is fundamental to the scientific method. While there is renewed interest in reproducibility as a result of recent research in particular

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

scientific disciplines arguing that a large fraction of research is not reproducible, the need to provide support for scientific activities by being open about the methods used and the data that were collected is longstanding. We encourage the professional staff of the federal statistical agencies to view their work as involving the application of scientific methods to produce official statistics and that therefore, consistent with scientific work, the actions taken throughout the process of developing these statistics need to be made transparent. If staff do not have this view of their work, steps taken to instill this attitude would be helpful. In addition, there are other reasons to support greater transparency. As with any complicated manufacturing process, for the production of statistics it is important to have a complete workflow history documenting how data are collected, how they are treated, how estimations are carried out, and how the quality of official estimates is assessed. These complete workflow histories allow federal statistical agencies to better manage and innovate in the production of official statistics.

In response to the NCSES request, the National Academies established the Panel on Transparency and Reproducibility of Federal Statistics for the National Center for Science and Engineering Statistics; the complete statement of task is in Box 1-1.

As is clear from the statement of task, the panel undertook its work with a dual focus: the degree of transparency of practices at NCSES and the degree of transparency in all of the principal agencies of the federal statistical system. The two goals dovetail to a considerable degree, because NCSES’s policies and processes are typical of those of the other agencies. Further, as a relatively small statistical agency, NCSES is positioned to be nimble and innovative and to share what it learns with the broader statistical community. In addition, one of the benefits of greater documentation of methods and archiving of data is that it allows methods and data to be shared and reused either within an agency or across agencies. So, while as an individual agency NCSES can benefit from sharing and reusing data and methods internally, such benefits are likely to be even broader when considered across the entire federal statistical system and internationally. This is particularly relevant to NCSES, since it regularly interacts with international statistical agencies and the Census Bureau (as a major data collection agency for NCSES surveys). Examining the entire federal statistical system for its policies regarding transparency of data and methods, and assessing how using the same tools and standards in packaging the data and methods can facilitate sharing them across agencies, should yield important benefits.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

DEFINITIONS OF TRANSPARENCY AND REPRODUCIBILITY

The panel could find no formal definition of the term transparency when used in conjunction with official statistics, though its desirability is often cited. Two cases in which the transparency of official statistics is touched upon are the following: (1) the Quality Assurance Framework of the European Statistical System Version 2.0, which contains this language:

Transparency of processes. The statistical authorities document their production processes and documentation of these processes is available to staff. A condensed/summary version is made available to users through

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

user-oriented quality reports based on ESS standards, i.e. Single Integrated Metadata Structure (SIMS).2

and (2) the UN Statistics Quality Assurance Framework, which defines transparency in conjunction with objectivity, impartiality, and professionalism as follows:

publicising the methods used [and] … ; ensuring that statistics are determined by statistical considerations and not by pressure from providers or users and explaining major changes in methodology to users.3

For our purposes, transparency is the provision of sufficiently detailed documentation of all the processes of producing official estimates. The goal of transparency is to enable consumers of federal statistics to accurately understand and evaluate how estimates are generated. There are different levels of understanding. Since consumers vary in their interests and needs, transparent documentation includes basic information for the merely curious observer as well as technical information for experts. Similarly, there are different levels of evaluation, ranging from individual impressions about the usefulness of official estimates for idiosyncratic purposes to the most rigorous form of assessment, which is the attempt to reproduce the estimates in an independent investigation. Transparency makes it possible to understand how official estimates came to be as they are, and whether they are reliable.

While investigations of the reliability of federal statistics by outside organizations are not common, it is essential for agencies to have available the information necessary for them to be undertaken. The credibility of official estimates is undermined if questions about their genesis cannot be answered or if it is impossible to check them. The recent National Academies report, Reproducibility and Replicability in Science (2019b), observes that there are some general questions about the reliability of research results that have been raised across all scientific disciplines:

  1. Are the data and analysis laid out with sufficient transparency and clarity that the results can be checked?
  2. If checked, do the data and analysis offered in support of the result in fact support that result?
  3. If the data and analysis are shown to support the original result, can the result reported be found again in the specific study context investigated?
  4. Finally, can the result reported or the inference drawn be found again in a broader set of study contexts? (p. 44).

___________________

2European Statistical System (2019, p. 28).

3United Nations Statistics Division (2016, p. 39).

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

In this report, we are concerned with transparency in relation to the first three questions. Answering questions 1 and 2 involves scrutinizing and employing information from the statistical agency to check results that the agency has published. This sort of investigation—notably involving data analysis (including “cleaning,” editing, and weighting) and associated computer code—seeks to determine if the published results based on these stages of the research process can be reproduced. Answering question 3 involves conducting a much broader independent investigation, within the specific study context that produced the original official estimates. By “context” here we mean the full set of study components, from conceptualization to design to data collection to data analysis and publication. Can conducting a parallel investigation, using the same procedures as those followed by the agency to construct the official estimates, reproduce those estimates, within a reasonable margin of error?

Further, one should have a prespecified margin of error that one anticipates from reproducing estimates from independent studies. This idea is specified in reproducibility exercises done by people connected with the Center for Open Science.4 Recognizing that such studies would be very complex and expensive, our recommendations for transparent documentation are aimed at urging agencies to have available the information needed to make them possible.

In a study of the reproducibility of a set of official statistics that was derived from a sample survey, one would use the same definition of the target population (e.g., “adults 18-75 in the United States, living in non-institutional quarters [not prisons, nursing homes, etc.]”), the same sample design (e.g., multistage area probability design using primary sampling units [metropolitan areas], census tracts within those, census blocks within those, blocks within those, clusters of dwelling units within those, and individuals sampled randomly within those). The idea is to use the identical sample design but with a new sample. One would have to keep to the same time period, but there are some details that are unclear, such as whether one would use the same respondents, the same interviewers for each respondent, and whether efforts would be made so that the indicator of nonresponse would be the same across replications. Then reproducibility is assessed against an acceptable margin of error, which is primarily sampling error. But as suggested in the argument above, it might also include other errors, depending on how closely the reproducing study’s procedures can mimic those of the official study.

___________________

4 For example, see Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015, published by the Center for Open Science at https://osf.io/pfdyw/.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

PRACTICAL BENEFITS OF TRANSPARENCY

We believe that agencies engaged in the production of official statistics have the obligation to strive to be as transparent as possible due to the fact that official statistics are the product of a scientific activity which, by its nature, implies a responsibility to indicate what work was done. Further, federal statistics are used for important purposes and therefore there is a need to be clear with users about how they were produced. In addition to supporting this norm of openness, a variety of benefits are gained when the agencies that produce official statistics provide greater transparency. These benefits can be combined into four categories: (1) efficiency, (2) innovation and progress, (3) trust and confidence, and (4) value from the use of the data products.

Efficiency arises at an agency producing statistics when what is done to produce them is known so completely so that any temporary or permanent changes to staff due to resignations, retirement, or sickness can be easily accommodated. As part of any organization that undertakes or oversees complex processes, it is necessary for statistical agencies to retain, in an accessible way, detailed information as to how they accomplish their various data collection designs, data treatments, and estimation tasks as components of the production of their official statistics. Transparency therefore can be seen as being consistent with sound management. By retaining this information, an agency can ensure that new hires or transfers can quickly get up to speed regarding what is needed and where in the process it fits in. Further, this knowledge facilitates the identification of sources of problems should they become known. Finally, if staff understand that their work is subject to review, it encourages a thoroughness and care that is beneficial.

Innovation and progress result from internal staff and external researchers understanding in some detail how official statistics are produced and thereby being able to discover areas in need of improvement. So transparency supports methodological innovation. Both internal and external researchers are in a better position to enhance methods for data collection or estimation if clear and detailed descriptions of the processes used for producing official statistics exist. These benefits arise, in particular, when researchers external to an agency are given sufficient information to conduct research on improving the methods, since they understand what is done and they can also assess the fitness of the data or estimates for other applications.

Trust and confidence derive from a full understanding of how a set of estimates are produced and knowing that what is done is consistent with state-of-the-art procedures and is not used to benefit any particular stakeholder. Official statistics matter. They are used to allocate substantial governmental funds and they are used to inform and assess the effectiveness

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

of policies to improve public health, education, the economy, employment, agriculture, and commerce. Given their broad impact, official statistics should be produced using the best possible science, with complete objectivity, under the constraints of minimizing respondent burden, and expending needed funds in a cost-efficient manner. Most importantly, the public must have confidence that all of this is the case.

One component—of both building trust and earning it—is for federal statistical agencies to “open their books” to the extent feasible. Having an accurate and complete description of how an official statistical series is developed gives external users, especially those with some relevant expertise, confidence that the agency is approaching the data collection and estimation problem with care and objectivity. Along the same lines, knowing which input datasets were used to produce a set of estimates, with the statistical methods used to prepare them for the estimation methodology as well as the details of the estimation methodology, promotes trust. In addition, an “open books” approach provides the raw material for a test of computational reproducibility, which can greatly support a sense of trust in a set of official statistics.

Finally, for the full value of statistics to be realized through their use, the quality of the statistical processes must be documented. If the quality of a set of official estimates is not documented, users are more likely to misuse the estimates by not understanding their limitations or by using them in combination inappropriately.

CALLS FOR TRANSPARENCY

The need for transparency by statistical agencies is argued for by Rancourt (2019, p. 549) in an insightful perspective:

With society becoming more complex and increasingly digitized, transparency is more needed, and in fact requested more than ever by citizens. Transparency is a key enabling piece of trust and accountability. It is beneficial to both social acceptability and scientific integrity and is an integral part of quality processes. Transparency is the quality of an object or process through which one can see. The definition can be quite fluid and mean a number of things depending on the context, but in the world of Official Statistics, it means that approaches, methods, decisions, and information are made available to users, researchers, stakeholders and citizens.

Such calls for transparency are not new. A 1978 report from the U.S. Department of Commerce states:

To help guard against misunderstanding and misuse of data, full information should be available to users about sources, definitions, and methods

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

used in collecting and compiling statistics, and their limitations. (U.S. Department of Commerce, 1978, p. 11)

Transparency is mandated by statistical agencies outside the United States as well. In their Policy on Informing Users of Data Quality and Methodology, Statistics Canada outlines its following policy on transparency in data and methods:

Statistics Canada, as a professional agency in charge of producing official statistics, has the responsibility to inform users of the concepts and methodology used in collecting, processing and analyzing its data, of the accuracy of these data, and of any other features that affect their quality or ‘fitness for use’….5

In the European Union, Eurostat states that its Code of Practice

is the cornerstone of the common quality framework of the European Statistical System … based on 16 Principles covering the institutional environment, statistical processes and statistical outputs…The development, production and dissemination of our statistics are based on sound methodologies, the best international standards and appropriate procedures that are well documented in a transparent manner.6

More recently, in the United States, the Report of the Commission on Evidence-Based Policymaking (2017)7—which the federal statistical agencies are using to guide them in updating their own programs and methods over the next several years—includes the following statements:

  • Government also can dramatically improve transparency about its collection and use of data, improving the American public’s ability to hold the government accountable. Adhering to the highest possible standards with respect to privacy and accountability is an important part of earning the public’s trust. (p. 8)
  • Transparency. Those engaged in generating and using data and evidence should operate transparently, providing meaningful channels for public input and comment and ensuring that evidence produced is made publicly available. (p. 17)
  • The existing infrastructure for accessing, linking, and analyzing confidential data for evidence building does not always prioritize state-of-the-art transparency and oversight. (p. 74)

___________________

5https://www.statcan.gc.ca/eng/about/policy/info-user.

6https://ec.europa.eu/eurostat/web/quality/european-statistics-code-of-practice.

7https://bipartisanpolicy.org/commission-evidence-based-policymaking/.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

Growing out of the work of the Commission, the Foundations for Evidence-Based Policymaking Act of 2018 (P.L. 115-425, a subset of which is the OPEN Government Data Act) indicates how agencies should implement many of the recommendations of the Commission. The Act contains two sections relevant to the panel’s work:

(101) The bill requires agencies to submit annually to the Office of Management and Budget (OMB) and Congress a systematic plan for identifying and addressing policy questions. The plan must include, among other things … data the agency intends to collect, use, or acquire to facilitate the use of evidence in policymaking; methods and analytical approaches that may be used to develop evidence to support policymaking; … Agency strategic plans shall contain an assessment of the coverage, quality, methods, effectiveness, and independence of the agency’s statistics, evaluation, research, and analysis efforts.

(202) This bill requires public government data assets to be published as machine-readable data. The General Services Administration must maintain an online federal data catalogue to provide a single point of entry for the public to access agency data. Each agency shall develop and maintain a comprehensive data inventory.

In addition, in March 2018, “The President’s Management Agenda” laid out a new cross-agency priority goal: leveraging data as a strategic asset to develop and implement a comprehensive Federal Data Strategy. One of the principles of this strategy is to promote transparency. The Federal Data Strategy also urges the adoption of 40 practices, of which eight (strategies 5, 14, 16, 19, 20, 26, 29, and 33) are relevant to this report.8

The Panel wishes readers to see that the views expressed in the reports of these two advisory groups motivate many of the recommendations that are offered here and that implementing these recommendations will result in a more modern federal statistical system, one that uses efficient processes and whose methods and data are more sharable across agencies.

Finally, an early 2021 memorandum (January 27th) from the Biden administration, entitled “Restoring Trust in Government Through Scientific Integrity and Evidence-Based Policymaking,”9 addressed, among other things, the need for transparency and reproducibility of federal statistics. This memorandum follows other recent official memos emphasizing the need for better record retention (at the National Archives and Records Administration), especially administrative records data, in order to support greater reuse of collected data both by researchers and by other statistical agencies.

___________________

8https://strategy.data.gov/action-plan/.

9https://www.whitehouse.gov/briefing-room/presidential-actions/2021/01/27/memorandum-on-restoring-trust-in-government-through-scientific-integrity-and-evidence-based-policymaking/.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

There are also recent instances when some failure to document methods or archive input datasets has resulted in a loss of important information. One example (among many) reported on during the 2017 workshop on transparency that preceded this study (NASEM, 2019a) is that the Census Bureau did not retain documentation of the entire workflow used in conducting the 2010 Census. Although the Census Bureau did retain many component processes and data, failure to retain the entire workflow made it more difficult to evaluate some of those processes, which may have affected preliminary work in designing the 2020 census.

Although we did not undertake a formal investigation, we suspect that in the federal statistical system documentation of methods and retention of input datasets is less complete for one-time programs than for continuing programs. In addition, there are concerns that retention of methods and data is often carried out in a way that does not facilitate sharing such techniques and data among agencies, due to the fact that the formats used for the descriptive metadata are often agency specific. Further, much of what is done is survey-centric, so it is much less clear what information ought to be retained, even internally, for programs making use of administrative records or digital trace data. (Digital trace data are data that are collected in concert with our use of various forms of technology, especially the Internet but also, e.g., data on smartphone use and supermarket scanner data.)

Relevant Transparency Initiatives at the Office of Management and Budget, the Census Bureau, and the American Association for Public Opinion Research (AAPOR)

OMB’s Standards and Guidelines for Statistical Surveys.10 Produced by OMB in 2006, Standards and Guidelines for Statistical Surveys provides extensive instruction for statistical agencies. Of the standards it prescribes, three relevant to this report are reproduced below:

Survey Design Standard 1.2: Agencies must develop a survey design, including defining the target population, designing the sampling plan, specifying the data collection instrument and methods, … and be able to measure estimation error…. Documentation of each of these activities and resulting decisions must be maintained in the project files for use in documentation (see Standards 7.3 and 7.4).

Evaluation Standard 3.5: Agencies must evaluate the quality of the data and make the evaluation public (through technical notes and documentation included in reports of results or through a separate report) to allow

___________________

10https://www.whitehouse.gov/wp-content/uploads/2021/04/standards_stat_surveys.pdf.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

users to interpret results of analyses, and to help designers of recurring surveys.

Survey Documentation Standard 7.3: Agencies must produce survey documentation that includes those materials necessary to understand how to properly analyze data from each survey, as well as the information necessary to replicate11 and evaluate each survey’s results (See also Standard 1.2). Survey documentation must be readily accessible to users unless it is necessary to restrict access to protect confidentiality.

OMB (2006) continues with extensive guidance for statistical agencies’ transparency efforts in regard to survey data collections, as summarized in Table 1-1.

Statistical agencies have also developed requirements for their internal use to support transparency. An example is the Census Bureau Statistical Quality Standards,12 which contains the following requirements, as summarized in Table 1-2.

Nongovernmental organizations have promoted techniques for enhancing transparency as well. The AAPOR Transparency Initiative,13 introduced by the American Association for Public Opinion Research (AAPOR) in 2009 and officially launched in 2014, is an effort to help bring about a greater degree of openness in survey practice. The idea behind it is to commend organizations that pledge to practice transparency in their reporting of survey-based findings. AAPOR specifies the following as characteristics of transparency (AAPOR, 2015, pp. 2–3):

Item 2: The exact wording and presentation of questions and response options whose results are reported…;

Item 3: A definition of the population under study and its geographic location;

Item 8: A description of the sample design, giving a clear indication of the method by which the respondents were selected, recruited, intercepted or otherwise contacted or encountered, along with any eligibility requirements and/or oversampling;

Item 9: Method(s) and mode(s) used to administer the survey (e.g., CATI, CAPI, ACASI, IVR, mail survey, web survey) and the language(s) offered; Item 10: Sample sizes (by sampling frame if more than one was used) and a discussion of the precision of the findings. For probability samples, the estimates of sampling error will be reported, and the discussion will state

___________________

11 The term “replicate” as used here is likely a synonym for the term “reproduce” as used in this report.

12https://www.census.gov/about/policies/quality/standards.html.

13https://www.aapor.org/Transparency_Initiative.htm.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

TABLE 1-1 OMB Standards and Guidelines for Statistical Surveys: Sections 7.3 and 7.4

OMB Standard/Guideline Documentation Required of Agency
Standard 7.3: Survey Documentation Standard 7.3: Agencies must produce survey documentation that includes those materials necessary to understand how to properly analyze data from each survey, as well as the information necessary to replicate and evaluate each survey’s results (See also Standard 1.2). Survey documentation must be readily accessible to users unless it is necessary to restrict access to protect confidentiality.
Guideline 7.3.1: Survey system documentation includes all information necessary to analyze the data properly.
  1. OMB Information Collection Request package;
  2. Description of variables used to uniquely identify records in the data file;
  3. Description of the sample design, including strata and sampling unit identifiers to be used for analysis;
  4. Final instrument(s) or a facsimile thereof for surveys conducted through a computer-assisted telephone interview (CATI) or computer-assisted personal interview (CAPI) or Web instrument that includes the following: All items in the instrument (e.g., questions, check items, and help screens); Items extracted from other data files to prefill the instrument (e.g., dependent data from a prior round of interviewing); and Items that are input to the post data collection processing steps (e.g., output of an automated instrument);
  5. Definitions of all variables, including all modifications;
  6. Data file layout;
  7. Descriptions of constructed variables on the data file that are computed from responses to other variables on the file;
  8. Unweighted frequency counts;
  9. Description of sample weights, including adjustments for nonresponse and benchmarking and how to apply them;
  10. Description of how to calculate variance estimates appropriate for the survey design;
  11. Description of all editing and imputation methods applied to the data (including evaluations of the methods) and how to remove imputed values from the data;
  12. Descriptions of known data anomalies and corrective actions;
  13. Description of the magnitude of sampling error associated with the survey;
  14. Description of the sources of nonsampling error associated with the survey (e.g., coverage, measurement) and evaluations of these errors;
  15. Comparisons with independent sources, if available;
  16. Overall unit response rates (weighted and unweighted) and nonresponse bias analyses (if applicable); and
  17. Item response rates and nonresponse bias analyses, (if a licable)
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
OMB Standard/Guideline Documentation Required of Agency
Guideline 7.3.2: To ensure that a survey can be replicated1 and evaluated, the agency’s internal archived portion of the survey system documentation, at a minimum, must include the following:
  1. Survey planning and design decisions, including the OMB Information Collection Request package;
  2. Field test design and results;
  3. Selected sample;
  4. Sampling frame;
  5. Justifications for the items on the survey instrument, including why the final items were selected;
  6. All instructions to respondents and/or interviewers either about how to properly respond to a survey item or how to properly present a survey item;
  7. Description of the data collection methodology;
  8. Sampling plan and justifications, including any deviations from the plan;
  9. Data processing plan specifications and justifications;
  10. Final weighting plan specifications, including calculations for how the final weights were derived, and justifications;
  11. Final imputation plan specifications and justifications;
  12. Data editing plan specifications and justifications;
  13. Evaluation reports;
  14. Descriptions of models used for indirect estimates and projections;
  15. Analysis plans;
  16. Time schedule for revised data; and
  17. Documentation made publicly available in conjunction with the release of data.
Guideline 7.3.3: For recurring surveys, produce a periodic evaluation report, such as a methodology report, that itemizes all sources of identified error. Where possible, provide estimates or bounds on the magnitudes of these errors; discuss the total error model for the survey; and assess the survey in terms of this model.
Guideline 7.3.4: Retain all survey documentation according to appropriate Federal records disposition and archival policy.
Standard 7.4: Documentation and Release of Public-Use Microdata Agencies that release microdata to the public must include documentation clearly describing how the information is constructed and provide the metadata necessary for users to access and manipulate the data (See also Standard 1.2). Public-use microdata documentation and metadata must be readily accessible to users.
Guideline 7.4.1: Provide complete documentation for all data files.
Guideline 7.4.2: Provide a file description and record layout for each file. All variables must be clearly identified and described.
Guideline 7.4.3: Make all microdata products and documentation accessible by users with generally available software.
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
OMB Standard/Guideline Documentation Required of Agency
Guideline 7.4.4: Clearly identify all imputed values on the data file.
Guideline 7.4.5: Release public-use microdata as soon as practicable to ensure timely availability for data users.
Guideline 7.4.6: Retain all microdata products and documentation according to appropriate Federal records disposition and archival policy. Archive data with the National Archives and Records Administration and other data archives, as appropriate, so that data are available for historical research in future years.

1 The term “replicate” as used here is likely a synonym for the term “reproduce” as used in this report.

SOURCE: U.S. Office of Management and Budget, 2006.

TABLE 1-2 U.S. Census Bureau’s Statistical Quality Standard F2: Providing Documentation to Support Transparency in Information Products

Statistical Quality Standard Documentation
Requirement F2-1 Documentation that would breach the confidentiality of protected information or administratively restricted information or that would violate data-use agreements with other agencies must not be released.
Requirement F2-2 Documentation must be readily accessible in sufficient detail to allow qualified users to understand and analyze the information and to reproduce (within the constraints of confidentiality requirements) and evaluate the results.
Requirement F2-2.1 Descriptions of the data program must be readily accessible.
Requirement F2-2.2 Descriptions of the concepts, variables, and classifications that underlie the data must be readily accessible.
Requirement F2-2.3 Descriptions of the methodology, including the methods used to collect and process the data and to produce estimates, must be readily accessible.
Requirement F2-2.3.1 Measures and indicators of the quality of the data must be readily accessible.
Requirement F2-2.3.2 The methodology and results of evaluations or studies of the quality of the data must be readily accessible.
Requirement F2-2.4 Documentation of public-use data files must be readily accessible in sufficient detail to allow a qualified user to understand and work with the files.

SOURCE: U.S. Census Bureau, 2015.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

whether or not the reported margins of sampling error or statistical analyses have been adjusted for the design effect due to weighting, clustering, or other factors;

Item 11: A description of how the weights were calculated, including the variables used and the sources of weighting parameters, if weighted estimates are reported.

SOME CONSTRAINTS

Despite the considerable benefits to being transparent, it is important to stress that the answer is not simply, “More is always better.” More transparency may not be legal or feasible. For example, the release of some input datasets from surveys could make confidential personal information for individuals or businesses available, which would violate the law as well as undermine trust in the statistical agency. Further, because statistical agencies are increasingly relying on nonsurvey data, including administrative, commercial, and other digital trace data, the data may come with additional restrictions on transparency. There are laws and interagency memoranda of understanding that often prevent the disclosure of administrative data. Commercial data or contractor intellectual property may be protected by contract for commercial reasons. The same is true for some digital trace data. In some cases, methods for producing estimates cannot be shared because doing so could make the estimates susceptible to manipulation (e.g., release of the stores that surveyors visit to record prices to estimate the Consumer Price Index).

There are resource costs associated with achieving greater transparency, so investments in transparency should meet a cost-benefit test. In some cases, where there is limited interest or benefit from a more detailed level of technical documentation, the benefits might not justify the costs, especially for perennially resource-constrained statistical agencies. Cost-benefit tradeoffs are discussed further in Chapter 2.

More generally, not all users look at statistical estimates with the same expectations or having the same fundamental knowledge of statistics. Transparency requires documentation that is appropriate and accessible to the user: excessively technical documentation may not make statistical agency products more transparent to some users. Some users may just want to know the quality of the estimates and how the estimates can be safely used but will not want to wade through complicated software code in order to understand what underlies the methodology used. Other users may only want to have an overview of how the data were treated and the estimates produced. In contrast, academic researchers may wish to have various methodological details that can only be examined through access to (a subset of) the complete (commented) code. Statistical agencies need to

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

conduct user experience testing and produce information at multiple levels for different users with different needs.

Transparency should not be considered as just a passive interaction. Rather, it should be understood as an active effort, a way of making information available to a user community so its members can know how a set of official statistics were produced and what their resulting quality is. In this communication, both the audience (the user community) and the matter being communicated are important, with different users needing different portions of the details provided.

REPORT STRUCTURE

As noted above, this report has a dual focus on both NCSES and the overall federal statistical community. Recommendations are offered to NCSES, along with recommendations that apply to all the major federal statistical agencies, on how to embrace the value of transparency.

Chapter 2 starts with a description of the complexity of the federal statistical programs and an explanation as to why documentation is a challenge. It details the various advantages that stem from greater transparency in the production of official statistics. The chapter then describes existing requirements, followed by our assessment of existing practices regarding documentation and archiving, including what OMB requires, records schedules and data management plans, and documentation of data treatment techniques and estimation methodologies. It concludes with a discussion of constraints and arguments for less than complete transparency in some situations.

Chapter 3 examines changes in archiving practices that can improve transparency and reproducibility, including the role of the National Archives and Records Administration and the implications of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles14 and existing OMB directives. The chapter addresses the role of catalogs and searchable metadata repositories in achieving “findability” and the importance of paradata for scientific innovation.

Chapter 4 discusses changes in documentation practices that could facilitate transparency and reproducibility of the statistical methods used in the federal statistical agencies. These includes various information technology tools to assist with version control and software development, as well as collaboration tools and methods for retaining workflows. Also discussed is the necessity of improved interactions with the public.

Chapter 5 examines the contribution that metadata standards make to transparency and reproducibility. It defines metadata and discusses the risks

___________________

14https://www.go-fair.org/fair-principles/.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

and benefits of using metadata systems for documentation and archiving and how such systems could be integrated into the current systems in use at the agencies. The chapter provides a summary description of common metadata standards currently in use and discusses how increased use would affect transparency.

Chapter 6 presents best practices for NCSES to enhance its transparency. It describes NCSES’s current programs and their existing publication standards, some areas for improvement, and recommendations as to how such improvements could be arrived at.

Finally, Chapter 7 provides a discussion of best practices for federal statistical agencies. This chapter lays out aspects of what a more modern approach to federal statistics would consist of, including better documentation and archiving practices that provide for data and methodology sharing among agencies and a smoother interface with the public. Recommendations as to how this new vision can be initiated are provided.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 15
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 16
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 17
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 18
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 19
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 20
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 21
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 22
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 23
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 24
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 25
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 26
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 27
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 28
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 29
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 30
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 31
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 32
Next: 2 Current Practices for Documentation and Archiving in the Federal Statistical System »
Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies Get This Book
×
Buy Paperback | $35.00 Buy Ebook | $28.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Widely available, trustworthy government statistics are essential for policy makers and program administrators at all levels of government, for private sector decision makers, for researchers, and for the media and the public. In the United States, principal statistical agencies as well as units and programs in many other agencies produce various key statistics in areas ranging from the science and engineering enterprise to education and economic welfare. Official statistics are often the result of complex data collection, processing, and estimation methods. These methods can be challenging for agencies to document and for users to understand.

At the request of the National Center for Science and Engineering Statistics (NCSES), this report studies issues of documentation and archiving of NCSES statistical data products in order to enable NCSES to enhance the transparency and reproducibility of the agency's statistics and facilitate improvement of the statistical program workflow processes of the agency and its contractors. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies also explores how NCSES could work with other federal statistical agencies to facilitate the adoption of currently available documentation and archiving standards and tools.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!