The National Center for Science and Engineering Statistics (NCSES) of the National Science Foundation (NSF) communicates its science and engineering (S&E) information to data users in a very fluid environment in which data dissemination practices, protocols, and technologies, on one hand, and user demands and capabilities, on the other, are changing faster than the agency has been able to accommodate. In this chapter, we discuss how strong forces are driving changing expectations on the part of users of S&E resource and workforce data, as well as how technology and a changing policy and analytical environment in the federal government are forcing NSF to rethink and modernize the manner in which NCSES communicates information to the public.
To help understand how NCSES can respond to the driving forces that we document, we also discuss the environment that it faces and that faces federal statistical agencies in general. For NCSES, this environment is determined by policies established by the Office of Management and Budget (OMB), NSF, and its own policies and procedures that have evolved over the years.
S&E INVESTMENT AND ECONOMIC GROWTH
Much of the pressure that NCSES faces to modernize the way it disseminates information stems from the subject matter itself. It has become increasingly understood that investment in research and development (R&D) creates a platform for innovation and that innovation is a major determinant of national economic competitiveness and growth. It has like-
wise been increasingly apparent that an associated major determinant is human capital, represented in the output of programs of education and training for the S&E workforce.
The relationship of innovation and science, technology, engineering, and mathematics (STEM) education has been recognized in several major reports, and these reports have formed the basis for major program initiatives. The recent report, Rising Above the Gathering Storm: Energizing and Employing America for a Brighter Future, concluded that a primary driver of the future economy and concomitant creation of jobs will be innovation, largely derived from advances in science and engineering (National Academy of Sciences, National Academy of Engineering, and Institute of Medicine, 2007). Underscoring the case for R&D investment is the conclusion by the National Science Board that “while only four percent of the nation’s work force is composed of scientists and engineers, this group disproportionately creates jobs for the other 96 percent” (National Science Board, 2010a, Figure 3-3).
The 2010 follow-up report to the Gathering Storm report further concluded that “substantial evidence continues to indicate that over the long term the great majority of newly created jobs are the indirect or direct result of advancements in science and technology, thus making these and related disciplines assume what might be described as disproportionate importance” (National Academy of Sciences, National Academy of Engineering, and Institute of Medicine, 2010, p. 18).
The conclusions of these reports are based on analysis that relies heavily on the data that are produced by NCSES. Indeed, the need for good data on science and engineering was recognized as a principle for competitiveness in another recent report, which concluded that “benchmarking national competitiveness across a set of established and forward looking metrics—measuring both inputs such as education, R&D spending, patents and outputs such as job creation, new industries and products, gross domestic product growth and quality of life—is necessary to drive the successful development and implementation of appropriate competitiveness policies” (Global Confederation of Competitiveness Councils, 2010, p. 3).
The three pillars on which the White House Strategy for American Innovation are built—education, research, and private-sector innovation1—are topics on which NCSES now collects data. The White House strategy focuses on educating the next generation with 21st century skills, creating a world-class workforce, and strengthening and broadening American leadership in fundamental research. In order to measure progress in educating the next generation, data are needed on progress in STEM education and its outcomes. The place of American leadership in fundamental research
1See http://www.whitehouse.gov/innovation/strategy/executive-summary [November 2011].
requires data on investments in fundamental science by the public and private sectors, as well as information on the nature and benefits of federally funded investments in research. The White House strategy requires measuring private-sector innovation expenditures (via the Business Research and Development Information Survey).
Recent legislation also underscored the importance of NCSES data. The America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science (America COMPETES) Reauthorization Act of 2010 requires “a comprehensive study of the economic competitiveness and innovative capacity of the United States.” This law, among other initiatives, changed the name and mission of NCSES (see below). It strongly emphasized the need for improvements in the current competitive and innovation performance of the U.S. economy relative to other countries that compete economically with it; coming to grips with regional issues that influence the economic competitiveness and innovation capacity of the United States; and evaluating the effectiveness of the federal government in supporting and promoting economic competitiveness and innovation. All of these initiatives require access to the kind of information that NCSES produces in its data collections.
A BROADER MISSION FOR NCSES
The new emphasis on innovation and competitiveness has been reflected in the new mission statement for NCSES. Not only was the Science Resources Statistics Division (SRS) renamed the National Center for Science and Engineering Statistics by Section 505 of the America COMPETES Act, but also new roles and missions were assigned. Several words in the new mission statement signal this new direction: serve as a “central Federal clearinghouse” for the collection, interpretation, analysis, and “dissemination” of objective data on science, engineering, technology, and research and development. NCSES expects to use the findings and recommendations in this report in determining how best to implement its new dissemination mandate.
According to the America COMPETES Act, the dissemination function is to cover “data related to the science and engineering enterprise in the United States and other nations that is relevant and useful to practitioners, researchers, policymakers, and the public, including statistical data on—(A) research and development trends; (B) the science and engineering workforce; (C) United States competitiveness in science, engineering, technology, and research and development; and (D) the condition and progress of United States STEM education.” Data collections related to U.S. competitiveness and STEM education are part of these new responsibilities.
We note that these new roles and responsibilities came without additional resources in terms of budget or staff.
The next two sections present examples of the role that NCSES data play in supporting initiatives to develop federal R&D indicators.
SCIENCE OF SCIENCE POLICY
The need for science and engineering metrics has been embedded in the NSF Science of Science and Innovation Policy (SciSIP), as originally articulated by John H. Marburger III, the former director of the Office of Science and Technology Policy (OSTP) and presidential science adviser. According to the agency’s description, “the SciSIP program underwrites fundamental research that creates new explanatory models, analytic tools and datasets designed to inform the nation’s public and private sectors about the processes through which investments in S&E research are transformed into social and economic outcomes. Or, put another way, SciSIP aims to foster the development of relevant knowledge, theories, data, tools, and human capital. SciSIP’s goals are to understand the contexts, structures and processes of S&E research, to evaluate reliably the tangible and intangible returns from investments in R&D, and to predict the likely returns from future R&D investments within tolerable margins of error and with attention to the full spectrum of potential consequences” (National Science Foundation, 2008).
The STAR METRICS (Science and Technology for America’s Reinvestment: Measuring the EffecTs of Research on Innovation, Competitiveness and Science) program is led by an interagency consortium consisting of the National Institutes of Health (NIH), NSF, and OSTP (Lane and Bertuzzi, 2010). The goal of the program is to create a data infrastructure that will permit the analysis of the impact of science investments using administrative records as well as other electronic sources of data. The program will have two phases. The first phase will use university administrative records to calculate the employment impact of federal science spending through the American Recovery and Reinvestment Act and agencies’ existing budgets. The second phase will measure the impact of science investment in four key areas:
- Economic growth will be measured through such indicators as patents and business start-ups.
- Workforce outcomes will be measured by student mobility into the workforce and employment markers.
- Scientific knowledge will be measured through publications and citations.
- Social outcomes will be measured by the long-term health and environmental impact of funding.
The metrics derived from the NCSES surveys are essential inputs to such science, innovation, and competitiveness metrics. The emphasis on metrics has been adopted and codified as a key element in the NSF Strategic Plan for 2011 to 2016 (National Science Foundation, 2011, p. 9)
RESEARCH AND DEVELOPMENT DASHBOARD
As this report was being prepared, OSTP further underscored the importance of innovation to the economy by announcing the launch of an online tool that permits tracking of U.S. progress in innovation. The R&D Dashboard is a website that demonstrates the impacts of federal investments in R&D (Koizumi, 2011).
The initial R&D Dashboard website presents data on federal R&D awards to research institutions and links those inputs to outputs—specifically publications, patent applications, and patents produced by researchers funded by those investments—from two agencies over the decade from 2000 to 2009: NIH and NSF play a significant role in funding basic research in the United States; more than 80 percent of the federal government’s support of university-based research, for example, comes from these two agencies. The site gathers information from two federal sites, USASpending.gov and IT.USASpending.gov, and has information on R&D investments at the state, congressional district, and research institution levels. Information that feeds the Dashboard from these two sites, however, is not being updated because of funding cuts.2
The OSTP R&D Dashboard is designed to answer questions of the following kind: Which institutions by state are performing federally funded research? What fields of science are emphasized locally? Where are the hot spots for robotics, for example, or optical lasers, or advanced textiles resulting from federally funded research? How are federal research grants contributing to the scientific literature by field of science?
The Dashboard is looked on as a first step. OSTP plans to explore fundamental changes in how data on R&D are made available to the public. As in other areas included in the push for greater transparency, the emphasis will be on testing models for making R&D-related data from contributing agencies available in ways that are secure, interoperable, and usable by a wide array of potential users. The initial emphasis will be to coordinate further development with coordinating bodies supported by OSTP, includ-
ing the National Nanotechnology Initiative and the National Coordination Office (NCO) for Networking and Information Technology Research and Development (NITRD).
INTERNET TRANSFORMS THE DISSEMINATION ENVIRONMENT
In the realm of information dissemination, the Internet has been changing everything for some time. The ongoing radical transformation in the modes of data dissemination has profound implications for NCSES.
More than 15 years ago, the OMB’s Federal Committee on Statistical Methodology (FCSM) recognized the growing presence of electronic options for data dissemination in a report entitled Electronic Dissemination of Statistical Data (Office of Management and Budget, 1995). The authors of this report were quite prescient in noting that the rapid expansion of computer technology had “led to vast changes in the supply of and demand for Federal statistical data. Technology is no longer the primary barrier between users and information.” The authors forecast even further changes with the advent of a national information infrastructure that would have even greater impact. The report concluded that statistical agencies would need “to adopt new methods of disseminating statistical information and data to replace the traditional means that used to serve as the principal source of statistical information” (p. 1).
The day foretold by the FCSM committee has long since arrived. The current choices are no longer between paper publications and electronic dissemination, but between various modes of and options for electronic dissemination. Like many other statistical agencies, NCSES has, except for a few special publications, largely abandoned hard-copy publication of its data. Now there are a multitude of choices among electronic means of retrieving reports and data elements—the most prominent of these choices for the federal statistical agencies today are FedStats and Data.gov, which are discussed in Chapter 2.
From a handful of interconnected government and university research computers, the Internet has grown to near ubiquity, and today’s users search the web for more information than was available in the past.3 Moreover, with the increased availability of broadband and high-speed Internet access, dynamic, multimedia-laden websites are replacing formerly static web pages, with the consequence that users have the expectation of being able to interact with the information for which they are searching.
Moreover, a recent poll of the Pew Internet and American Life Project
3See http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Search_Aug08.pdf [November 2011].
showed that access to the Internet is quickly becoming “untethered”4 and users are turning to smartphones and other mobile devices for access to the World Wide Web, social networking, and email. As a consequence, those who disseminate information will need to react to these changes, by continuing to leverage the newest means to access and interact with information on the web.
The U.S. government has made a number of mobile applications available on the USA.gov website. Several agencies have developed a mobile edition of their website—an abridged version available to users of smartphones, tablets, personal data assistants (PDAs), and other mobile handheld devices. Taking into account the information that it is beginning to collect in the online survey of data users (described in Chapter 4), NCSES could profitably consider mobile versions of its web presence, perhaps beginning with the development of a mobile application for its announcements of product releases and the InfoBrief series.
FEDERAL GOVERNMENT DATA DISSEMINATION POLICIES
As a federal statistical agency, NCSES operates within a set of OMB guidelines that cover a wide variety of statistical practices, from survey design to data collection to dissemination. The federal government’s policies regarding dissemination of information to the public are promulgated by OMB under the authority of the Paperwork Reduction Act (PRA) of 1980, Public Law 96-511, as amended by the Paperwork Reduction Act of 1995, Public Law 104-13 (44 USC 35). The PRA mandate is broad, calling on agencies to “perform their information activities in an efficient, effective, and economical manner” (Office of Management and Budget, 2000).
Under this authority, published in OMB Circular A-130, NCSES is required to (a) disseminate information in a manner that achieves the best balance between the goals of maximizing the usefulness of the information and minimizing the cost to the government and the public; (b) distribute information dissemination products on equitable and timely terms; (c) take advantage of all dissemination channels, federal and nonfederal, including state and local governments, libraries, and private-sector entities, in discharging agency information dissemination responsibilities; and (d) help the public locate government information maintained by or for the agency.
NCSES is also called on to maintain and implement a management system for all information dissemination products that ensures that members of the public with disabilities, whom the agency has a responsibility
to inform, have a reasonable ability to access the U.S. Government Printing Office for distribution to depository libraries. Electronic information dissemination is encouraged.
These broad guidelines of Circular A-130 are further detailed in the OMB standards and guidelines for statistical surveys (Office of Management and Budget, 2006). The standards suggest that, when information products are disseminated, NSF should provide users with access to the following information:
- definitions of key variables;
- source information, such as a survey form number and description of methodology used to produce the information or links to the methodology;
- quality-related documentation, such as conceptual limitations and nonsampling error;
- variance estimation documentation;
- time period covered by the information and units of measure;
- data taken from alternative sources;
- point of contact to whom further questions can be directed;
- software or links to software needed to read/access the information and installation/operating instructions, if applicable;
- date the product was last updated; and
- standard dissemination policies and procedures.
NATIONAL SCIENCE FOUNDATION GUIDELINES
As an operating organization in NSF, NCSES must adhere to the NSF guidelines regarding the quality of data disseminated to the public. These guidelines were developed to comply with OMB-issued government-wide guidelines under Section 515 of the Treasury and General Government Appropriations Act for Fiscal Year 2001 (P.L. 106-554), which were designed to ensure and maximize the quality, objectivity, utility, and integrity of information disseminated by federal agencies.
Under NSF guidelines, utility is achieved by staying informed of both internal and external information needs and by developing new data or information products when appropriate. This is a multifaceted process, involving keeping abreast of information needs by conducting internal analyses of information requirements, convening and attending conferences, working with advisory committees and committees of visitors, and sponsoring outreach activities. The NSF guidelines require review of ongoing publication series and other information products on a regular basis to ensure that they remain relevant and address current information needs.
Integrity guidelines cover aspects of the security of information from
unauthorized access or revision to ensure that the information that is disseminated is not compromised through corruption or falsification. NSF guidelines are designed to ensure that information is protected from unauthorized access, corruption, or revision (i.e., making certain disseminated information is not compromised through corruption or falsification).
NSF also includes objectivity in its guidelines. This is a focus on ensuring that information that is disseminated is accurate, reliable, and unbiased and that information products are presented in an accurate, clear, complete, and unbiased manner. Objectivity is achieved by presenting the information in the proper context, identifying the sources of the information (to the extent possible, consistent with confidentiality protections), using reliable data and sound analytical techniques, and preparing information products that are carefully reviewed. These guidelines call for the inclusion of metadata (information about the data), in that all original and supporting data sources used in producing statistical data products should be clearly identified and documented, either in the publication or on each individual table. The metadata will generally include specification of variables used, definitions of variables when appropriate, coverage or population issues, sampling errors, disclosure avoidance rules or techniques, confidentiality constraints, and data collection techniques.
DATA RELEASE POLICY
A decade and a half ago, the predecessor agency to NCSES issued a policy on data release that was based on a consumer survey of data relevance and quality that led to a review by an internal Customer Service Task Force (National Center for Science and Engineering Statistics, 1994). This was the first of two consumer surveys; a second, conducted in 1996, was summarized in Measuring the Science and Engineering Enterprise: Priorities for the Science Resources Studies Division (National Research Council, 2000, p. 42). The consumer studies have not been repeated since.
The 1994 data release policy statement declared that its objectives were to encourage the timely release of SRS (NCSES) survey data, ensure that the released data meet SRS standards for “releasability,” and ensure that NSF management knows when the data are to be released.
According to this policy statement, the main vehicle for release of timely data was the Data Brief, which is designed to publicize the data and provide a targeted group of users with some understanding of the implications of the data. The goal was to produce timely and accurate data, with accuracy defined as free from such flaws as gross typographical errors or methodological mistakes, and that they appear plausible. Procedures for internal clearance were also outlined.
Finally, the panel suggests that NCSES consider, in conducting its dissemination program, the dissemination guidelines outlined in Principles and Practices for a Federal Statistical Agency (National Research Council, 2009). In regard to dissemination, this volume states that a statistical agency should strive for the widest possible dissemination of the data it compiles. Data dissemination should be timely and public. Furthermore, measures should be taken to ensure that data are preserved and accessible for use in future years. Elements of an effective dissemination program include the following:
- An established publications policy that describes, for a data collection program, the types of reports and other data releases to be made available, the audience to be served, and the frequency of release.
- A variety of avenues for data dissemination, chosen to reach as broad a public as reasonably possible. Channels of dissemination include, but are not limited to, an agency’s Internet website, government depository libraries, conference exhibits and programs, newsletters and journals, email address lists, and the media for regular communication of major findings.
- Release of data in a variety of formats, including printed reports, easily accessible website displays and databases, public-use microdata5 and other publicly available computer-readable files, so that the information can be accessed by users with varying skills and needs for data retrieval and analysis. All data releases should be suitably processed to protect confidentiality, with careful and complete documentation.
- For research and other statistical purposes, access to relevant information that is not publicly available through restricted access modes that protect confidentiality. Such modes include protected research data centers, remote monitored online access for special tabulations and analyses, and licensing of individual researchers to allow them to use confidential data on their desktop computers under stringent arrangements to ensure that no one else can access the information.
- Procedures for release of information that preclude actual or perceived political interference. In particular, the content and timing of
5In the National Research Council report, and throughout this report, the term “microdata” is defined in the statistical sense, that is, microdata are data on the characteristics of units of a population, such as individuals, households, or establishments, collected by a census, survey, or experiment (U.S. Bureau of the Census, 1998).
the statistical agency, and the agency or unit that produces the data should publish in advance and meet release schedules for important indicators to prevent even the appearance of manipulation of release dates for political purposes.
- Policies for the preservation of data that guide what data to retain and how they are to be archived for future secondary analysis.