The National Center for Science and Engineering Statistics (NCSES) of the National Science Foundation (NSF) communicates its science and engineering information to data users in a very fluid environment that is undergoing modernization at a pace at which data producer dissemination practices, protocols, and technologies, on one hand, and user demands and capabilities, on the other, are changing faster than the agency has been able to accommodate.
NCSES asked the Committee on National Statistics and the Computer Science and Telecommunications Board of the National Research Council to form a panel to review the NCSES communication and dissemination program that is concerned with the collection and distribution of information on science and engineering and to recommend future directions for the program according to its statement of task (see Box S-1).
The Panel on Communicating National Science Foundation Science and Engineering Information to Data Users reviewed NCSES’s existing approaches to communicating and disseminating statistical information, including the division’s information products, website, and database systems; examined existing NCSES data on websites, information gathered by and from NCSES staff, volunteered comments of users, and input solicited by the panel from key user groups; assessed the varied needs of different types of users in the NCSES user community; considered the impact that current federal and NSF website guidance and policies have on the design and management of the NCSES online (Internet) communication and dissemination program; considered current research and practice in collecting, storing, and utilizing metadata, with particular focus on specifications for
Statement of Task
An ad hoc panel will review the communication and dissemination program of the National Science Foundation (NSF) National Center for Science and Engineering Statistics (NCSES) that is concerned with the collection and distribution of information on science and engineering and recommend future directions for the program. Specifically, the panel will
- Review NCSES’s existing approaches to communicating and disseminating statistical information, including the division’s information products, website, and database systems. [This review will be conducted in the context of both current “best practices” and new and emerging techniques and approaches.]
- Examine existing NCSES data on websites, information gathered by and from NCSES staff, volunteered comments of users, and input solicited by the panel from key user groups, and assess the varied needs of different types of users within NCSES’s user community.
- Consider the impact current federal and NSF website guidance and policies have on the design and management of the NCSES’s online (Internet) communication and dissemination program.
- Consider current research and practice in collecting, storing, and utilizing metadata, with particular focus on specifications for social science metadata developed under the Data Documentation Initiative (DDI).
- Consider the impact of government-wide activities and initiatives (such as FedStats, Data.gov) and the emerging user capability for online retrieval of government statistics.
The panel will facilitate its review by conducting a 2-day public workshop that will feature invited presentations and discussions. The panel will subsequently prepare an interim letter report that will focus on issues regarding transition from current approaches and a final report with specific recommendations, including a discussion of related technical, staffing, and funding issues.
social science metadata; and considered the impact of government-wide activities and initiatives (such as FedStats, Data.gov) and the emerging user capability for online retrieval of government statistics.
In accomplishing this review, the panel conducted two workshops to gather information from data users and experts on various aspects of data
storage, retrieval, dissemination, and archiving. An interim report issued early in 2011 addressed data content and presentation, meeting changing storage and retrieval standards, understanding data users and their emerging needs, and data accessibility. The analysis and recommendations from the interim report are carried into this final report, along with the results of subsequent analysis by the panel.
These are exciting and challenging times for federal government statistical agencies responsible for disseminating their data products to their user communities, and the times are especially challenging for NCSES, which is finding the importance of its data magnified many fold by the growing recognition of the role that science and engineering investment is playing as a source of economic growth. The vision of a data dissemination program for NCSES is also in a time of flux. The agency is confronting new roles and missions, as directed in the America COMPETES Reauthorization Act of 2010, which changed more than its name. Technology is also opening the door to significant leaps in the ability of NCSES to communicate data and analytical products to data users. The promise of such services as Data.gov and the emergence of such private-sector solutions as the Google Public Data Explorer are just becoming recognized. The semantic web (Web 3.0) holds promise of communicating data to users in entirely new ways, much to the advantage of users and the federal agencies themselves (Berners-Lee and Hendler, 2001). These technological advances open the way to new opportunities, but they are also problematic in that they are rapidly promulgated and, many times, they rapidly become obsolete. The panel suggests that NCSES adopt an approach to modernization that stresses the basics of data provision (common formats with appropriate metadata) and partnerships with the private sector as opportunities become available, so that NCSES will avoid the issue of rapid obsolescence associated with rapid change in the particular tools and systems offered by the private sector.
In the face of these environmental and technological forces, we make a number of recommendations to the National Center for Science and Engineering Statistics to improve its dissemination program. The first set of recommendations has to do with how the survey-based data are received and input into the NCSES database, managed once there, and preserved for posterity. (The recommendations are numbered as they appear in the body of this report.)
Recommendation 3-1. The National Center for Science and Engineering Statistics should incorporate provisions in contracts with data providers for the receipt of versioned microdata, at the level of detail originally collected, in open machine-actionable formats.
Recommendation 3-2. The National Center for Science and Engineering Statistics should transition to a dissemination framework that emphasizes database management rather than data presentation and strive to use auditable machine-actionable means, such as version control, to ensure integrity of the data and make the provenance of the data used in publications verifiable and transparent.
Recommendation 3-3. The National Center for Science and Engineering Statistics (NCSES) should require that data received from contractors be accompanied by machine-actionable metadata so as to allow for automated production of NCSES publications, comparability with previous analysis, and efficient access for third-party visualization, integration, and analysis tools.
Recommendation 3-4. The National Center for Science and Engineering Statistics should proceed to make its data available through open interfaces and in open formats compatible with efficient access for third-party visualization, integration, and analysis tools.
Recommendation 3-5. The National Center for Science and Engineering Statistics should develop a plan for redesign of its retrieval tools utilizing the emerging, sustainable capabilities of other government and private-sector resources.
Recommendation 3-6. The National Center for Science and Engineering Statistics (NCSES) should work with the National Archives and Records Administration (NARA) to ensure long-term access and preservation of all of its publications and all data necessary to replicate these publications. As a necessary step, NCSES should review and update the request for disposition authority that is filed with NARA to ensure prompt and complete disposition of records and should regularly review the status of compliance with the records retention directive.
Engaging with its data users is an essential activity for NCSES. There is much that can be done to make that engagement more productive.
Recommendation 4-1. The National Center for Science and Engineering Statistics (NCSES) should analyze the results of its initial online consumer survey and refine it over time. Using input from other sources, such as regular structured user focus groups and panel-based periodic user surveys, NCSES should regularly and systematically collect and analyze patterns of data use by web users in order to develop a typology of data users and to identify usability issues.
Recommendation 4-2. The National Center for Science and Engineering Statistics should educate users about the data and learn about the needs of users in a structured way by reinstating the program of user workshops and instituting user webinars.
Recommendation 4-3. The National Center for Science and Engineering Statistics should employ user-focused design and user analysis, starting with an initial heuristic evaluation and continuing as a regular and systematic part of its website and tool development.
Recommendation 4-4. The National Science Foundation should sponsor research and development on accessible data visualization tools and approaches and potential other means for browsing and exploring tabular data that can be offered via web, mobile, and tablet-based applications, or browser-based ones.
The implementation of this report’s recommendations should be undertaken within an overall framework that accords priority to the basic quality of the data and the fundamentals of dissemination, then to significant enhancements that are achievable in the short term, while laying the groundwork for other long-term improvements. The framework could be organized along the following lines (highest priority first):
- Focus on collecting the right data (by contractor or otherwise); using appropriate change management and version control to establish data provenance, flag data errors and correct them; annotating those data with sufficient machine-actionable metadata to establish a process for interpreting the data, enabling efficient access to third-party data and to automated NCSES publications; and publishing the data in formats with web-accessible open interfaces for all to use.
- Publish methods for combining old data and new data that have been collected under different assumptions or categories or that are disseminated in ways that make them difficult to reintegrate—this is especially necessary for the data from the old and new industry research and development expenditure surveys that will populate the Industrial Research and Development Information System.
- Provide the essential data reductions and visualizations that NSF’s mission requires, for example, when Congress asks for authoritative data on a certain topic, a trusted group must be able to use the data and derived publications to calculate answers.
- Provide a growing array of visualizations and printed products tailored for the many different uses and users.
Not every recommendation made in this report can or should be implemented immediately. Some recommendations must build on the implementation of others; for example, development of a database structure that can support accessibility through the semantic web requires that NCSES obtain data from its contractors in different formats than are now received and that it define metadata to accompany the data elements. We therefore suggest a time-phased approach to improving data dissemination, focusing on five major initiatives:
- Change the means and content of the data received from contractors and actively participate in the development and implementation of the Data.gov compatible metadata standard now being explored by W3C and the SCOPE project.
- Gain a better understanding of the needs of users of the data—those primary, secondary, and tertiary blocks of users—and then use the information to engage them in an effort to educate them and otherwise meet their needs.
- Conduct a continuous usability evaluation program, much akin to a program of continuous improvement that is part and parcel of any total quality management program.
- Provide data in retrievable formats and encourage private-sector providers and individual users to import the data into their visualization tools.
- Ensure full short- and long-term access to the data by updating its retrieval tools and ensuring proper archiving of its publications and database.