EXECUTIVE SUMMARY
The Panel to Review Research and Development Statistics at the National Science Foundation was asked to look at the definition of R&D, the needs and potential uses of R&D data by a variety of users, the goals of an integrated system of surveys and other data collection activities, and the quality of the data collected in the existing surveys of the Division of Science Resources Statistics (SRS) of the National Science Foundation (NSF). This interim report presents the panel’s findings and conclusions regarding the present array of surveys on matters of statistical accuracy and reliability, as well as interim recommendations on near-term improvements that should be considered and could be implemented by NSF in developing plans and making resource decisions for the next several years.
Significant progress has been made by the Science Resources Statistics Division in fostering an environment for improvement of data quality. The panel is hopeful that these recent initiatives, buttressed by additional resources and supplemented by further initiatives such as those outlined in this report, will lay a basis for further improvements in the future.
COMMON METHODOLOGICAL ISSUES
Across the surveys we examined, we focus on four basic methodological issues: web-based collection, the practice of providing prior-year data to survey respondents, the designation of respondents, and nonresponse adjustment and imputation.
The very newness of web-based collection in the academic and federal government surveys suggests both opportunities and challenges. Web-based collection can afford efficiencies and economies and promises to improve such functions as editing and imputation. However, in reviewing how web-based collection is now implemented, the panel raises several cautions, suggesting that additional research is needed on such issues as self-imputation that is forced by demanding a data entry in each cell.
The panel considered the practice of providing data collected in prior-year responses to respondents. Although we have some confidence that the practice does not generate significant errors, the panel urges NSF to sponsor research on the effect of imprinting prior-period data on the industrial R&D survey in conjunction with testing the introduction of web-based data collection.
The industrial R&D survey gives limited attribution to interaction with survey respondents, beyond the largest reporters, and then only when questions or issues of nonresponse are encountered. The panel supports the initiative to identify individual respondents in companies as a first and necessary step toward developing an educational interaction with respondents, so as to improve response rates and the quality of responses. The panel also strongly recommends that NSF and the U.S. Census Bureau resume a program of field observation staff visits to a sampling of reporters to examine record-keeping practices and conduct research on how respondents fill out the forms.
The surveys have very different approaches to the treatment of nonresponse and the imputation of missing values. The panel recommends that NSF revise the Statistical Guidelines for Surveys and Publications to direct standards for treatment of unit
nonresponse and to direct the computation of response rates for each item, prior to sample weighting.
THE SURVEYS
Survey of Industrial Research And Development
This large and important survey was the focus of much of the panel’s attention. We looked at design, data collection, estimation, and processing issues. To assist in controlling the possibility of editing error, the panel recommends that the editing system be redesigned so that the current problems of undocumented analyst judgment and other sources of potential error can be better understood.
We also considered the vexing problem of estimation of state R&D expenditures. A new composite estimator has recently been developed by NSF and the Census Bureau. The panel commends NSF and the Census Bureau for developing this composite estimator, which takes into account research on small-area estimation. However, we recommend that additional simulations be conducted to assess the bias, variance, and mean square error of these new state estimates.
Survey of Federal Funds for Research and Development Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions
The panel highlights the long-term issue of timeliness of the estimates for these surveys, which have high response rates and do not pose serious issues in terms of statistical methodology.
Survey of Research and Development Expenditures at Colleges and Universities Survey of Scientific and Engineering Research Facilities
The academic and biomedical facility surveys rely heavily on a knowledgeable point of contact to collect information, which generally must be obtained by a wide data collection in these institutions. The panel urges NSF to contact a sample of respondents to check their records in order to improve understanding of the best means of gathering the data, the sophistication of reporting sources within an institution, and the interpretation of questions and definitions. Furthermore, concerned about the lack of knowledge about the response patterns, the panel recommends study of the cognitive aspects of collection instruments and reporting procedures.
In discussing recent promising development and testing of improvements in imputation procedures using a regression model, the panel is concerned that the tests were not sufficient to judge the soundness of the regression approach. The research should be redone utilizing a more standard procedure of withholding a set of independent data in order to test the model.
Nonresponse and imputation in the facilities survey were found by the panel to be troublesome. The panel recommends that NSF carefully review item nonresponse
patterns, conduct a response analysis survey to determine the base quality of these new and difficult items, and make a commitment to a sustained program of research and development on these conceptual matters.
The panel determined that imputation procedures in this survey vary by whether or not the institution previously reported. If the unit previously reported, prior responses were used in the imputation procedure; if not, other methods were employed. This procedure is of major concern to the panel. Imputation for unit nonresponse is highly unusual in surveys. In most surveys, unit nonresponse is handled by weighting, as it was in this survey in 1999. At a minimum, NSF should compare the results of imputation and weighting.
INTERIM ASSESSMENT
Although the focus of this interim report has been on issues of the reliability and accuracy of the statistical methodology, the panel recognizes that its study of these matters is incomplete without reference to several other sources of error and other shortcomings in the surveys. The issues of concept and definition, of timeliness, and of survey management have much to do with the overall quality of these surveys.
Likewise, the challenges posed by a changing environment for data collection, due to the growing prominence of “novel” forms of organizational arrangements for the conduct of research and development, need to be explored, as does the impact of the increasing globalization of R&D. Sectoral shifts in the focus of R&D and the influence of firm size also pose new challenges and opportunities for data collection. These issues and more will be addressed in the final report of the panel.