Page i Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Transparency

in Statistical Information

for the National Center for Science and
Engineering Statistics and
All Federal Statistical Agencies

Panel on Transparency and Reproducibility of Federal Statistics for
the National Center for Science and Engineering Statistics

Committee on National Statistics

Division of Behavioral and Social Sciences and Education

A Consensus Study Report of

THE NATIONAL ACADEMIES PRESS
Washington, DC
www.nap.edu

Page ii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001

This activity was supported by a contract between the National Academies of Sciences, Engineering, and Medicine and the National Science Foundation under grant number 1822391. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.

International Standard Book Number-13: 978-0-309-27045-8
International Standard Book Number-10: 0-309-27045-6
Digital Object Identifier: https://doi.org/10.17226/26360

Additional copies of this publication are available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.

Printed in the United States of America

Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. https://doi.org/10.17226/26360.

Page iii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.

The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.

The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.

The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.

Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.

Page iv Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.

Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.

For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.

Page v Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

PANEL ON TRANSPARENCY AND REPRODUCIBILITY OF FEDERAL STATISTICS FOR THE NATIONAL CENTER FOR SCIENCE AND ENGINEERING STATISTICS

DANIEL KASPRZYK (Chair), NORC at the University of Chicago

PHILIP ASHLOCK, GSA Technology Transformation Services, General Services Administration

DAVID BARRACLOUGH, Practices and Solutions Division, Organisation for Economic Co-operation and Development

CHRISTOPHER CHAPMAN, Sample Surveys Division, National Center for Education Statistics

DANIEL W. GILLMAN, Office of Survey Methods Research, U.S. Bureau of Labor Statistics

LINDA A. JACOBSEN, Population Reference Bureau, Inc.

H. V. JAGADISH, Department of Computer Science and Engineering, University of Michigan

FRAUKE KREUTER, Joint Program in Survey Methodology, University of Maryland

MARGARET LEVENSTEIN, Inter-university Consortium for Political and Social Research, University of Michigan

PETER V. MILLER, U.S. Census Bureau (retired)

AUDRIS MOCKUS, Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville

SARAH M. NUSSER, Center for Survey Statistics and Methodology, Iowa State University

ERIC RANCOURT, Modern Statistical Methods and Data Science Branch, Statistics Canada

WILLIAM L. SCHERLIS,* School of Computer Science, Carnegie Mellon University

LARS VILHUBER, Department of Economics, Cornell University

*Resigned from panel on October 28, 2019

MICHAEL L. COHEN, Senior Program Officer

MICHAEL SIRI, Associate Program Officer

CONNIE F. CITRO, Senior Scholar

JILLIAN KAUFMAN, Program Coordinator (until January 15, 2020)

ANTHONY MANN, Program Coordinator

JOHN GAWALT, Consultant (until May 18, 2020)

Page vi Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

COMMITTEE ON NATIONAL STATISTICS

ROBERT M. GROVES (Chair), Office of the Provost, Department of Mathematics and Statistics and Department of Sociology, Georgetown University

LAWRENCE D. BOBO, Department of Sociology, Harvard University

ANNE C. CASE, Woodrow Wilson School of Public and International Affairs, Princeton University

MICK P. COUPER, Survey Research Center, Institute for Social Research, University of Michigan

JANET M. CURRIE, Woodrow Wilson School of Public and International Affairs, Princeton University

DIANA FARRELL, JPMorgan Chase Institute, Washington, DC

ROBERT GOERGE, Chapin Hall at The University of Chicago

ERICA L. GROSHEN, The ILR School, Cornell University

HILARY HOYNES, Goldman School of Public Policy, University of California, Berkeley

DANIEL KIFER, Department of Computer Science and Engineering, The Pennsylvania State University

SHARON LOHR, Consultant and Freelance Writer

JEROME P. REITER, Department of Statistical Science, Duke University

JUDITH A. SELTZER, Department of Sociology, University of California, Los Angeles

C. MATTHEW SNIPP, Department of Sociology, Stanford University

ELIZABETH A. STUART, Department of Mental Health, Johns Hopkins Bloomberg School of Public Health

JEANETTE WING, Data Science Institute, Columbia University

BRIAN HARRIS-KOJETIN,Board Director

MELISSA CHIU,Deputy Board Director

CONNIE F. CITRO,Senior Scholar

Page vii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Acknowledgments

A Consensus Study Panel requires many individuals to assist the panel in studying the issues identified in the panel’s statement of task. The Panel on Transparency and Reproducibility of Federal Statistics for the National Center for Science and Engineering Statistics is no different. Many experts were called upon to discuss issues, provide their expertise, and discuss their perspectives for the panel’s consideration. The panel thanks all these individuals for the assistance and knowledge.

The panel benefitted greatly from the presentations provided in its open sessions. The experts the panel heard from can be clustered into the following perspectives and areas of expertise (see Appendix C for the agendas for open meetings): NCSES staff: Emilda Rivers, May Aydin, Tiffany Julian, and Francisco Moris; experts in metadata standards as used internationally: Olivier Dupriez (World Bank), Pascal Heus (Metadata Technology North America), Heidi Koumarianos (Institut National de la Statistique et des Études Économiques), and Juan Munoz (National Institute of Statistics and Geography, Mexico); experts from the federal statistical system: William Bell (Census Bureau), Marcus Berzofsky (RTI International), Christopher Carrino (Census Bureau), Leighton L Christiansen (Bureau of Transportation Statistics), Brad Edwards (Westat), John Eltinge (Census Bureau), Dennis Fixler (Bureau of Economic Analysis), Nick Hart (Data Coalition), Nancy Potok (formerly Office of Management and Budget), Mark Prell (Economic Research Service), Marilyn Seastrom (National Center for Education Statistics), Tori Velkoff (Census Bureau), and Zack Whitman (Census Bureau); experts in computer science: Jeremy Iverson and Dan Smith (Colectica), and Natasha Noy (Google); experts in

Page viii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

administrative records data: John Czajka and Mathew Stange (Mathematica Policy Research); and an expert in the federal statistical user community: Jason Jurjevich (University of Arizona). We also heard from expert users of NCSES data: Kimberlee Eberle-Sudre (Association of American Universities) and Anne-Marie Knott (Washington University in St. Louis).

In addition to these public presentations, panel and staff participated in meetings and conference calls with staff from NCSES and the Interagency Council on Statistical Policy as well as George Alter (Inter-university Consortium for Political and Social Research), Jeremy Iverson (Colectica), and Rolf Schmitt and Leighton L Christiansen (Bureau of Transportation Statistics). Further, to gain insight into what is currently carried out in major statistical programs in terms of documentation and archival policy, the panel sent an informal questionnaire to the leaders of 20 programs of the federal statistical system, receiving responses from 11. The results of this questionnaire are provided in Chapter 2.

The panel and staff also studied a number of domestic and international documents that called for greater openness and transparency concerning national statistics. This included documents from NCSES, the Committee on National Statistics, the U.S. Office of Management and Budget (OMB), the United Nations Economic Commission for Europe (UNECE), Statistics Canada, the American Association for Public Opinion Research (AAPOR), and the White House.

The panel is also indebted to John Gawalt, previous director of NCSES, who not only helped to develop the funding for this study, but also served as unpaid consultant until May 2020. His knowledge of the federal statistical system and NCSES was invaluable as the panel interpreted its charge and organized its open sessions. In addition, John actively participated in weekly meetings or conference calls with the chair and staff which greatly helped clarify what issues the panel needed to focus its attention on and which helped organize the structure of the report.

The panel itself could draw on its own considerable expertise advising on programs from the federal statistical system, or in areas relevant to the new directions that had been discussed at a prior workshop on transparency. By subject area, these experts included: from federal statistical system: Philip Ashlock (General Services Administration, including data. gov), Christopher Chapman (National Center for Education Statistics), Dan Gillman (Bureau of Labor Statistics, Census Bureau), Dan Kasprzyk (Census Bureau, National Center for Education Statistics), Peter Miller (Census Bureau), and Sarah Nusser (Iowa State University); concerning metadata standards and tools: David Barraclough (Organisation for Economic Co-operation and Development [OECD]) and Dan Gillman; from international statistical agencies: David Barraclough (OECD), Frauke Kreuter (Joint Program of Survey Methodology and the University of

Page ix Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Mannheim), and Eric Rancourt (Statistics Canada); concerning computer science tools applicable to federal statistics: H.V. Jagadish (University of Michigan), Audris Mockus (University of Tennessee), and Lars Vilhuber (Cornell University); concerning archiving: Margaret Levenstein (Inter-university Consortium on Political and Social Research) and Lars Vilhuber; and from the statistical user community: Linda Jacobsen (Population Reference Bureau).

In creating the chapters of our report, the following individuals played a key role: the first draft of the Summary was completed by Connie Citro of CNSTAT; Chapter 1 and the tables in Chapter 7 were primarily drafted by Peter Miller; Chapter 3 was primarily drafted by Lars Vilhuber, Margaret Levenstein, and Frauke Kreuter; important parts of Chapter 4 were drafted by Audris Mockus and Linda Jacobsen; Chapter 5 was drafted by Dan Gillman and David Barraclough, and sections of this chapter were drawn from material provided by Michael Lenard and Andrea Thomer, both of the University of Michigan, consultants to the panel. Under the panel’s guidance, Lenard and Thomer also completed the first draft of Appendix A, while Dan Gillman drafted Appendix B.

Finally, the panel thanks staff for the preparation of the entire report. Michael Cohen and Michael Siri provided tireless energy and enthusiasm to the panel and its work, organizing open meetings, individual phone calls, and Zoom meetings, following up on a myriad of issues and comments, and organizing and drafting the report. Following through on the comments and ideas of panel members was a significant undertaking. The panel appreciated their interest and effort. Jillian Kaufman and Anthony Mann provided excellent administrative support during the panel’s data gathering activities.

This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.

We thank the following individuals for their review of this report: Katharine G. Abraham, Joint Program in Survey Methodology, University of Maryland, College Park; Christopher Carrino, Office of the Chief Information Officer, U.S. Census Bureau; Leighton L Christiansen, Bureau of Transportation Statistics; Mick P. Couper, Institute for Social Research, University of Michigan; Robert L. Griess, Department of Mathematics, University of Michigan; Pascal Heus, Metadata Technology North America; Nicholas Horton, Statistics and Data Science, Amherst College; Juan

Page x Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Muñoz López, Informatics Planning and Governance, National Institute of Statistics and Geography of Mexico (INEGI); Regina L. Nuzzo, Freelance Science Writer, Washington, DC; and Nancy A. Potok, Chief Statistician of the United States (retired).

Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by Alicia L. Carriquiry, Department of Statistics, Iowa State University, and Roderick J.A. Little, Department of Biostatistics, University of Michigan. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring committee and the National Academies.

Daniel Kasprzyk (Chair)
NORC at the University of Chicago

Page xi Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Current Practices with Record Schedules and Data Management Plans

The Role of Catalogs and Searchable Metadata

Issues Arising with Paradata

4 Assessments of Quality, Methods for Retaining and Reusing Code, and Facilitating Interaction with Users

Introduction

Assessing the Quality of Inputs Used to Produce Official Estimates

Transparency in Processing, Software Development

Facilitating User Interaction with Statistical Agencies

5 Metadata and Standards

Using Existing Systems

Standards and Interoperability

Examples of Statistical Metadata Standards

Conclusion

6 Making the Practices of the National Center for Science and Engineering Statistics More Transparent

Description of NCSES Programs

Transparency for External Users of NCSES Survey Output

Ease of Use of Information for Analysis Purposes

Priorities for NCSES

7 Best Practices for Federal Statistical Agencies

Best Practices for Documentation, Retention, Release, and Archiving of Data

Dealing with Errata in Official Statistics

A Vision of Federal Statistics in the Future

Resource Needs to Proceed

References

Page xiii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Appendixes

A Statistical Metadata Standards—in Detail

B The Role of Metadata in Assessing the Transparency of Official Statistics

C Public Meeting Agendas

D Biographical Sketches of Panel Members

Page xiv Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

This page intentionally left blank.

Page xv Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Boxes, Figures, and Tables

BOXES

S-1 Benefits of Transparency to Federal Statistical Agencies

1-1 Statement of Task

2-1 Programs That Responded to Informal Panel Questionnaire

3-1 Recent Classification Issue at the Bureau of Labor Statistics

3-2 NCSES and Paradata

3-3 Excerpts from 44 U.S. Code § 3511: Data inventory and Federal Data Catalogue

3-4 Examples of Guidelines for the Retention of Paradata

FIGURES

5-1 Example of a simple dataset description in XML

5-2 A simple dataset description in RDF

5-3 Conforming to standards—efficiencies gained

A-1 GSBPM: Its processes, phases, and sub-activities

A-2 BLS business process model

A-3 GSIM top-level groups

A-4 Simplified view of GSIM

A-5 Alternate but simplified view of GSIM

Page xvi Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

A-6 How GSIM and GSBPM work together

A-7 GSBPM levels implemented in GSIM

A-8 Overview of capabilities and (conceptual) building blocks of CSDA

A-9 Data life cycle as conceived in DDI Data Lifecycle

TABLES

1-1 OMB Standards and Guidelines for Statistical Surveys: Sections 7.3 and 7.4

1-2 U.S. Census Bureau’s Statistical Quality Standard F2: Providing Documentation to Support Transparency in Information Products

6-1 NCSES’ Survey Portfolio

7-1 Documenting Basic Elements of a Statistical Program

7-2 Documenting Statistical Programs Using Survey Data

7-3 Documenting Statistical Programs Using Administrative Records and/or Digital Trace Data

7-4 Documenting Data Integration Issues

7-5 Documenting Paradata from Statistical Programs

7-6 Archiving of Data

A-1 CSDA Principles: Statements, Rationales, and Implications

B-1 Elements

B-2 Elements for Describing Variables

B-3 Extended Elements for Describing Variables

Page xvii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

Acronyms and Definitions

AAPOR	American Association for Public Opinion Research
API	application programming interface
BEA	Bureau of Economic Analysis
BLS	Bureau of Labor Statistics
BTS	Bureau of Transportation Statistics
CAPI	computer-assisted personal interview
CATI	computer-assisted telephone interview
CE	Consumer Expenditure Survey
CNSTAT	Committee on National Statistics
CSDA	Common Statistical Data Architecture
CSPA	Common Statistical Production Architecture
DCAT	Data Catalog Vocabulary [DCAT] [related: DCAT-US, DCAT-AP]
DDI	Data Documentation Initiative
DMP	Data Management Plan
DSD	Data Structure Definition
ECDS	Early Career Doctorates Survey
EIA	Energy Information Administration
FAIR	Findable, Accessible, Interoperable, and Reusable
FCSM	Federal Committee on Statistical Methodology

Page xviii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

FSRDC	federal statistical research data center
GPS	Global Positioning System
GSBPM	Generic Statistical Business Process Model
GSIM	Generic Statistical Information Model
HLG-MOS	High Level Group for the Modernization of Official Statistics
ICPSR	Inter-University Consortium for Political and Social Research
ICSP	Interagency Council on Statistical Policy
ISO	International Organization for Standardization
JSON	JavaScript Object Notation
LEHD	Longitudinal Employer-Household Dynamics
MEPS	Medical Expenditure Panel Survey
NARA	National Archives and Records Administration
NASS	National Agricultural Statistical Service
NCES	National Center for Education Statistics
NCHS	National Center for Health Statistics
NCSES	National Center for Science and Engineering Statistics
NSCG	National Survey of College Graduates
NSF	National Science Foundation
OECD	Organisation for Economic Co-operation and Development
OMB	U.S. Office of Management and Budget
PII	personally identifiable information
PUMD	public use microdata
RDAS	Restricted Data Analysis System
RDF	Resource Description Framework
SDMX	Statistical Data and Metadata eXchange
SDR	Survey of Doctorate Recipients
SIS-CC	Statistical Information System Collaboration Community
SSDC	Survey Sponsored Data Center
UML	Unified Modeling Language
UNECE	United Nations Economic Commission for Europe
URI	Uniform Resource Identifier

Page xix Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

W3C	World Wide Web Consortium
XML	eXtensible Markup Language

Administrative records data: Data held by agencies and offices of the government that have been collected for other than statistical purposes to carry out basic administration of a program. (US OMB 2014 Guidance for Providing and Using Administrative Data for Statistical Purposes M-14-06.)

Archive: The National Space Science Data Center of the National Aeronautics and Space Administration (NASA) defines archives as follows (emphasis added):

The term ‘Archive’ has come to be used to refer to a wide variety of storage and preservation functions and systems. Traditional Archives are understood as facilities or organizations which preserve records, originally generated by or for a government organization, institution, or corporation, for access by public or private communities. The Archive accomplishes this task by taking ownership of the records, ensuring that they are understandable to the accessing community, and managing them so as to preserve their information content and Authenticity. …The major focus for preserving this information has been to ensure that they are on media with long term stability and that access to this media is carefully controlled. (p. 2-1)¹

Data management plans: A data management plan is a knowledge management document, prepared initially as a specific research or survey project is being planned, to lay out types of data to be collected, the possible presence of sensitive data, the roles of project members in relation to the data, and the planned archiving and preservation of the data. A data management plan can be a living document that may change many times over the course of the research or survey project. (https://www.usgs.gov/products/data-and-tools/data-management/data-management-plans)

Digital trace data: This includes data collected via the Internet to represent transactions of various kinds, grocery store scanner data, data collected to record mobile phone activities, data from radio frequency identification tags, etc.

Discoverability: Discoverability is the use of standard metadata to describe one’s datasets in a structured way, which makes it more likely that search

___________________

¹Management Council of the Consultative Committee for Space Data Systems, 2012.

Page xx Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

engines will be able to link these structured metadata with information describing its location and provide other linkages such as scientific publications and thereby facilitating its discovery for others.

Machine-actionable metadata: Machine-readable metadata in a format that can be used to drive some processes. This generally means there are no free-text fields. Fields that might be open text are instead populated by codes associated with a controlled vocabulary of possible entries.

Machine-readable metadata: Metadata in a format that can be read by a computer. The implication is that each metadata field may be individually separated and read. Documents rendered in HTML or PDF are readable by a computer program, but there are no individually readable fields.

Metadata: Data being used to describe some object(s). Statistical metadata are data (information) used to describe statistical objects, i.e., the metadata associated with a dataset, including the origins of the data, assessments of its quality, the variables included, their context and definitions, their values, their location in the database, what the different cases in the file refer to, and so on. Statistical metadata are best understood and most useful as structured information. Statistical metadata should be sufficient to allow someone not involved in an official statistics program to properly analyze an archived dataset resulting from that program. As Vardigan and Whiteman (2007) point out:

for a secondary analyst to understand a given dataset, he or she must have access to good documentation … A data file is ultimately just a string of numbers and not understandable on its own; it can only be interpreted and comprehended intellectually through use of the technical documentation … which indicates a variable’s location in the numeric data file, the question it was based on, all possible responses to the question, how the population of interest was sampled (for surveys) and so forth. (p. 76)

Metadata standard: A standard that addresses the kinds, meaning, and/or structure of data used as metadata. Standards are built through a consensus process that is open (any interested stakeholder may join), fair (every participating stakeholder has the same rights and privileges), observable (the process is open for inspection), and balanced (the participating stakeholders are representative of the entire set).

Metadata tool: A system developed for accessing or using metadata. Tools may be commercial, open source, or agency built. They are designed to address at least one aspect of the life cycle of metadata. Tools built to be

Page xxi Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×

used with a metadata standard are more widely applicable, since they can be adopted by any agency using that standard.

Paradata: “[A]dditional data that can be captured during the process of producing a statistic” (Kreuter, 2013). Such data are obtained throughout the survey process—as part of the initial interaction, the field staff’s observations, and the respondent’s actions. The data can be used to help ascertain and improve the quality of the collected data. Paradata, in the context of official statistics, are mainly used in conjunction with survey data and may consist of any information that helps to assess the ability of the respondent to respond accurately to the items in a (survey) instrument. What paradata will be collected for administrative records data or digital trace data is currently a research topic.

Record schedules: 36 CFR Subchapter B - RECORDS MANAGEMENT All Federal records, including those created or maintained for the Government by a contractor, must be covered by a NARA-approved agency disposition authority, SF 115, Request for Records Disposition Authority, or the NARA General Records Schedules. (36 CFR § 1225.10) General Records Schedules (GRS) are schedules issued by the Archivist of the United States (NARA) that authorize, after specified periods of time, the destruction of temporary records or the transfer to the National Archives of the United States of permanent records that are common to several or all agencies. (36 CFR § 1227.10) All agencies must follow the disposition instructions of the GRS, regardless of whether or not they have existing schedules.

Page xxii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.

×