These are exciting and challenging times for federal government statistical agencies responsible for disseminating their data products to their user communities. The times are especially challenging for the National Center for Science and Engineering Statistics (NCSES), which is finding the importance of its data magnified many fold by the growing recognition of the role that science and engineering (S&E) investment is playing as a source of economic and social growth and prosperity. But these are also uncertain times for federal government agencies like NCSES that are concerned over the future of their programs in light of fixed or declining budgets associated with the need to restrain government spending. There is a simultaneous growth in pressure to carefully evaluate all government activities to ensure their efficiency and cost-effectiveness. A key component of efficiency and effectiveness is a well managed and responsive data dissemination program.
The environment for the data dissemination program for NCSES is also in flux. The agency is confronting new roles and missions as directed in the America COMPETES Act, which changed the agency’s name and added significant new responsibilities. For example, the newly specified role of serving as a central federal clearinghouse for the collection, interpretation, analysis, and dissemination of objective data on science, engineering, technology, research and development, and innovation suggests a need for the agency to become more strategic in its outlook. NCSES will be venturing into new territory and will need to support a broader range of data users, particularly in areas of competitiveness and innovation, even as it seeks to modernize the dissemination services it now provides. The key to accom-
plishing these ends in an era of expected budget shortfalls and in view of the limited staff resources in the agency, including some of the technological skills that will be required to modernize the data processing and dissemination systems, is to take advantage of consortia opportunities and to proceed within a framework that accords priority to the most essential tasks.
STRENGTH IN NUMBERS
The task of developing and implementing a dissemination improvement plan is a tall order for NCSES to take on by itself. The agency is already stressed, with its constrained staff and budget resources, to meet the growing demand for its data and implement the several new areas of responsibility that have recently been added to its roles and missions.
One of several possible approaches to meet the needs of data users as well as to encouraging and expanding development of tools and applications that would facilitate the dissemination of its information by developers and dissemination channels is to take the necessary steps in concert with other agencies in the federal statistical community. The federal statistical agencies, as a group, have begun to organize to enhance dissemination of their data in the project called the Statistical Community of Practice and Engagement (SCOPE). SCOPE is an important beginning. There are efficiencies for both the agencies and users from more cross-agency collaboration, harmonization of definitions and terminology, identification of best practices, and sharing of the development of common tools that support best practices. As a participant in this community of practice, NCSES could maximize use of the capacity of Data.gov for service as a primary public interface and dissemination platform/portal, retrieval of data sets on the Data.gov data set hosting platform that is currently being developed, and harness Data.gov cloud computing power.
NCSES should also consider taking advantage of commonly developed, user-friendly data delivery and data display tools that have largely been developed by the World Wide Web Consortium (W3C) community. These tools address 508 compliant alternatives to tabular displays, develop displays of complex sample survey data while protecting confidential microdata, and develop visualization tools for multifaceted statistical designs. And it can benefit from such projects as promoting data harmonization and integration through the development of metadata and data exchange. Specifically, SCOPE will take the fundamental steps of developing and implementing Stats Metadata 1.0 (for delivery in fiscal 2012) and establishing common definitions to facilitate data exchange and interoperability (by fiscal 2013). The goal is to promote development and use of common platforms for data collection and data analysis and to suggest research on
solutions to the “data mosaic” problem in the current technology environment and support the creation of an open-source development community.
TIME-PHASED DISSEMINATION IMPROVEMENT PLAN
The panel understands that not every recommendation made in this report can or should be implemented immediately. Some recommendations must build on the implementation of others; for example, development of an open database structure that can support accessibility and dissemination through the use of open standards and formats requires that NCSES obtain from its contractors the data sufficient to make the results reproducible, in a format enabling automatic reproduction of all published tables, along with metadata sufficient to interpret the data elements and results.
The implementation of the report’s recommendations should be undertaken within an overall framework that accords priority to the basic quality of the data and the fundamentals of dissemination, then to significant enhancements that are achievable in the short term, while laying the groundwork for other long-term improvements. The framework could be organized along the following lines (highest priority first):
- Focus on collecting the right data (by contractor or otherwise); using appropriate change management and version control to establish data provenance, flag data errors and correct them; annotating those data with sufficient machine-actionable metadata to establish a process for interpreting the data, enabling efficient access to third-party data and to automated NCSES publications; and publishing the data in formats with web-accessible open interfaces for all to use.
- Publish methods for combining old data and new data that have been collected under different assumptions or categories or that are disseminated in ways that make them difficult to reintegrate—this is especially necessary for the data from the old and new industry research and development expenditure surveys that will populate the Industrial Research and Development Information System (IRIS).
- Provide the essential data reductions and visualizations that the mission of the National Science Foundation (NSF) requires, for example, when Congress asks for authoritative data on a certain topic, a trusted group must be able to use the data and derived publications to calculate answers.
- Provide a growing array of visualizations and printed products tailored for the many different uses and users.
Within this overall framework, three parallel tracks are suggested with concrete steps to improve data dissemination. The first track involves improving the transparency and reproducibility of published and disseminated results by obtaining complete, reliably versioned, well-documented, and machine-understandable data from contractors. This will require the modification of current contractual arrangements and procurements as referenced in the panel’s recommendations. The second track involves improving use of the NCSES products by establishing a formal, systematic, and continuous program for evaluating user needs and the usability of NCSES products via the web and other means of delivery. The third track involves ensuring full short- and long-term access to NCSES content by providing open data, offering machine-accessible protocols for access to data and other products, and establishing a continuous process for replicating or archiving releases by the National Archives and Records Administration for long-term preservation and access.
IMPROVING THE TRANSPARENCY AND REPRODUCIBILITY OF PUBLISHED RESULTS
As noted in earlier chapters, it is not currently possible to automatically and systematically reproduce or validate all tables and results in NCSES published products from the raw data. There are many contributing causes: not all data are made available to NCSES at the level of detail at which they were collected, data are not accompanied by machine-readable metadata, and there is a lack of a systematic version control/change-management process for the data prior to final delivery by contractors.
The root cause of this problem, as we have identified, is insufficient accountability from contractors. Contractors are not delivering the data and metadata in the detail most needed, and they are not supplying sufficient metadata, provenance information, or change management. Strengthening accountability from contractors is a first step to any improvement in transparency.
This should be followed by more systematic development of metadata standards, change management and versioning, and provenance tracking. These need not be perfect; any open, transparent, machine-understandable, automatic method could be used. And these can be then improved.
As part of improving metadata standards, NCSES should actively participate in the development and implementation of the Data.gov compatible metadata standard now being explored by W3C and the SCOPE project. Implementation of this standard, as discussed in this report, will require revamping the specifications for data delivery now in the contracts of the agency’s data collectors.
ESTABLISHING A FORMAL, SYSTEMATIC, AND CONTINUOUS USE AND USABILITY EVALUATION PROGRAM
We have pointed to the need for a continuous use and usability evaluation program, much akin to pointing to the need for a program of continuous improvement that is part and parcel of any total quality management program. We focus on use and usability because, like other federal statistical agencies, as NCSES continues to shed its hard-copy publication programs in favor of providing its data through web applications, usability will become a more important issue, and new uses and users have begun to be identified.
A first step is to develop a clearer understanding of requirements. In the first instance, the requirements for an NCSES dissemination program are essentially determined by the environment facing the agency, its legislative mandate, and guidance and directives from above. These are assessed in Chapter 1. The more difficult, but nonetheless important part of establishing a requirement is to understand the needs of its customers—the data users. As discussed in Chapter 4, NCSES today has only a rudimentary understanding of the range of its users and their data needs. Thus, the first step in the plan must be to gain a better understanding of the users of the data—those primary, secondary, and tertiary blocks of users—and then to engage them in an effort to understand their needs. Some steps have already been taken to enhance engagement of user groups. The measures of website use and the new online survey of web users are important and necessary first steps, but they are by no means sufficient to provide the kind of detailed knowledge NCSES needs. Agency leadership would be well advised to monitor the maturing space of web metrics and analytics. These, along with customer service programs, would enable continuous input, evaluation, and understanding of all users and their products.
The learnings from these outreach activities should then be widely shared. One possible activity would be to glean and post some kind of listing of user sites that have distilled the NCSES basic data, aggregated them, or combined them with other data. Although these derived forms cannot carry the NSF imprimatur of accuracy, they can be very helpful.
A suggested next step is to review the initiatives taken by Statistics Canada to evaluate the usability of its delivery methods. Tied in with usability, we urge attention to issues of accessibility for all users, with the understanding that 508 compliance is a necessary but insufficient first step.
We make several suggestions in Chapter 4 and Appendix B for enhancing the visitor’s experience with the NCSES website. Some of these suggestions can be implemented by NCSES; others will require coordination with the NSF organizations that establish the basic look and feel of the website.
ENSURING FULL SHORT- AND LONG-TERM ACCESS
As discussed in Chapter 3, the Internet changes the meaning of access. Ensuring full access in today’s environment requires that, as much as possible, machine-understandable microdata and metadata be made accessible via standard open protocols to any third party for use without restriction.
The power of visualization tools to retrieve and explain the data leads to the suggestion that a major emphasis throughout the implementation period should be on providing data that can be easily accessed by visualization tools. We do suggest that NCSES develop visualizations beyond the kind of rudimentary ones that it already provides in the Science and Engineering Indicators Digest. Rather, the agency should provide data in machine-accessible formats and explore partner relationships in the private sector to identify opportunities to leverage developing or existing tools/applications, along with maintaining open data formats and standards to allow individual users to import the data into their visualization sets. By adopting an approach that stresses the basics of data provision (common formats with appropriate metadata) and partnerships with the private sector as opportunities become available, the NCSES will avoid the issue of rapid obsolescence associated with rapid change in the particular tools and systems offered by the private sector.
Ensuring long-term access requires that both the NCSES publications and all of the data necessary to fully replicate them be archived. NSF should work with the National Archives and Records Administration, as the archive of record, to ensure that copies of all products and data, including those created by contractors, are efficiently delivered for long-term stewardship.
RAPID ITERATIVE IMPROVEMENTS
The recommendations in this report will take several years to implement. However, the groundwork can be laid, and many improvements made, in a relatively short amount of time, even in the first year. We suggest that at least the following be accomplished in the first year:
- Establish an ongoing archiving process.
- Revise contracts with data providers to ensure accountability for delivery of full microdata in machine-understandable format with change control.
- Perform a heuristic evaluation of the website.
- Initiate a process of continuous usage/user data needs collection.
- Disseminate existing microdata available using standard open machine protocols.
We expect that improvement will be iterative, and will primarily stem from development of further technologies, methods, and standards and from the collection of systematic information on user behavior and needs. In light of this, other recommended tasks can be deferred, awaiting further developments in technology or methods, for example:
- Redesigning the NCSES website can await heuristic evaluation.
- Developing a detailed metadata standard, can await a candidate metadata standard from the SCOPE and World Wide Web Consortium initiatives.
- Creating a capacity for user-influenced visualizations can await further developments in accessible visualization technology.
The future well-being of the U.S. economy depends on the nation’s capacity to generate, and take economic advantage of, technology-driven innovations across all industries, particularly those that compete internationally. This capacity in turn depends on choices that market actors, including the federal government, firms large and small, educational and research institutions, state and regional technology-based development agencies, workers, and students, make with regard to research and development, development of the science, technology, engineering, and mathematical workforce, and the commercialization of innovation. The data generated by NCSES will guide these choices. The data dissemination strategy of the agency, then, will have a substantial influence on the nation’s future economic path.
Technology is opening the door to significant advances in the ability to communicate data and analytical products to data users. The promise of such services as Data.gov and the potential for third-party services, such as the Google Public Data Explorer, and federated catalogs, such as the Data-Verse Network, to add value to the data and make them accessible to new groups of users and for new uses are just becoming recognized. The emerging Semantic Web (Web 3.0), expanded and new tools and approaches, open standards and platforms, the potential for mashups, and community-based platforms (including participative input, transparency by means of wikis and open government movements) show a more distant promise of communicating data to users in entirely new ways, much to the advantage of users and the federal agencies themselves.
To avail itself of the opportunities afforded in these new approaches, NCSES needs to adopt a vision of the future that supports access to data directly through the agency and through the many third-party services and catalogs that are emerging. NCSES also needs to have a plan that will lead to making its data available through open interfaces and open formats,
accompanied by open metadata, and to develop the necessary infrastructure to exploit these advances. These evolving technologies could open opportunities for addressing the visualization experience and overcoming accessibility limitations more effectively than the current browser-based experiences.