In order to take full advantage of many of the emerging data sharing and visualization tools described in Chapter 2, it is important that the incoming data be collected and ingested into the NCSES data processing system in as disaggregated a form as possible. The data should be accompanied by sufficient information about the data items (metadata) to support future analyses and comparability with previous analyses, and there should be an appropriate versioning/change management system to ensure that the ability to trace the origin and history of the data (provenance) is incorporated. This is challenging to NCSES because, for the most part, the agency data are collected, updated, and accessed by contractors to NCSES. Since the collection, tabulation, and front-end activities are controlled by contractors, NCSES must specify the requirements for data inputs that are compatible with retrieval in open data formats and suitable for retrieval in formats that support common tools that software developers use to process data.

The data also need to be in formats that enable taking advantage of the web development capabilities embedded in Data.gov and other emerging dissemination means. The data must be capable of mashup with other data sources. These capabilities require that access to the data be available through an open application programming interface (API) that exposes the disaggregated data, along with its metadata, in machine-understandable form. The result is to enrich results and enhance the value of the data to users.

It is critically important that the data be accompanied by the machine-actionable documentation (metadata) needed to establish the data’s history of origin and ownership (provenance) and include a record of any modifications made during data editing and clean-up. The documentation also needs to include the measurement properties of the data with sufficient detail and accuracy to enable publication-ready tables to be automatically generated in a statistically consistent manner.

Furthermore, it is critically important that a formal automated capability for tracking and controlling changes to a project’s files—in particular to source code, documentation, and web pages (version control)—and formal change management procedures be applied to data collected by contractors. This establishes a reliable data provenance and ensures that all previous publications can be automatically verified and replicated.

In the panel’s judgment, NCSES is not very well positioned to meet the above preconditions for taking advantage of emerging technologies. The survey data that are entered into the center’s database are received from the survey contractors in tabular format mainly though machine-readable tabulations, rather than in a more easily accessible microdata format.

This situation is not unique to the S&E data that are received from contactors by NCSES. Suzanne Acar (representing the U.S. Federal Bureau



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement