National Academies Press: OpenBook

Measuring Research and Development Expenditures in the U.S. Economy: Interim Report (2004)

Chapter: Appendix A: A Quality Profile of the Federal Research and Development Surveys

« Previous: References
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

APPENDIX A
A Quality Profile of the Federal Research and Development Surveys

Barbara A.Bailar

This quality profile discusses five of the federal research and development (R&D) surveys. Although there are other R&D surveys, many of them are not active. The five discussed are:

  1. Survey of Federal Funds for Research and Development

  2. Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions

  3. Survey of Research and Development Expenditures at Universities and Colleges

  4. Survey of Science and Engineering Research Facilities

  5. Survey of Industrial Research and Development

GENERAL BACKGROUND

The first two surveys are of federal agencies and use a frame constructed from information in the president’s budget as submitted to Congress and as included in budget documents of the Office of Management and Budget (OMB). The Survey of Research and Development Expenditures at Universities and Colleges is put together by comparing lists of institutions from different surveys, along with the results of the previous year’s survey, and many telephone calls. The Survey of Science and Engineering Research Facilities has two frames: the R&D expenditures survey and a list of biomedical institutions maintained by National Institutes of Health (NIH).

These four surveys have several links and have similar procedures. They all use ORC Macro as a contractor and a similar data collection and editing methodology. The industrial survey is much different, is much more complex, uses the Census Bureau’s Business list, formerly known as the Standard Statistical Establishment List (SSEL) maintained by the U.S. Census Bureau as its frame, and is conducted by the Census Bureau. The latter three surveys all trace the amount of funds spent on R&D, some of which come from federal agencies, while the first two trace the outlay of federal funds to industry and academia. There should be agreement between the amounts of reported by the federal agencies and the amounts reported from those who received the funds. The gap in these amounts is a cause for concern.

Similarities in Data Collection and Processing

Because ORC Macro conducts four of the five surveys, a common approach to data collection and processing has been taken. ORC Macro has developed a web-based system that provides:

  • A data collection system

  • A data monitoring system including receipt control

  • A data editing system

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

Each of the four surveys uses this system, either exclusively or for the majority of its cases. In recent years the survey has been delivered to the respondents electronically, rather than in paper format through the mail. The respondent uses the electronic survey form, and ORC Macro follows up by e-mail or telephone. It keeps logs, generated by the system, of the progress of the agencies or institutions in reporting. The web-based system contains embedded edits, questioning the entering of erroneous data. Respondents are asked to check the data, repair arithmetical errors, and explain trend differences. In some of the surveys, data cannot be submitted to the National Science Foundation (NSF) until all error messages have been dealt with.

Not all respondents submit data on the web-generated form, so that some of the surveys have to maintain a manual editing system. The web-based system is also used to tabulate data.

There are many advantages to a web-based system. It should reduce the amount of error caused by carelessness or not having data from prior years to check against. It allows NSF to know which questions are giving respondents difficulty, since one can keep track of edit failures. It speeds up the processing. Also, there is complete knowledge of the status of every survey participant throughout the delivery, edit, and imputation stages.

There are a few disadvantages to the system. In some cases, it forces respondents to provide data that they are unsure about. In this sense, it forces imputation by the respondent, rather than by NSF. When a statement is made in NSF publications that there is no item nonresponse, it conveys an impression of accuracy that may very well be misleading. Another disadvantage seems to be an inability to deal with weighted data, although that could probably be overcome. However, these four surveys have fewer than 1,000 respondents and no sampling, no nonresponse adjustment by weighting, and no other weighting is done.

The industry survey is primarily a mail survey, although the forms have been put on the web. It would benefit this survey to have a well-researched form designed for the web, with the appropriate embedded edits. It would facilitate the calculation of response rates and give indications of which items were causing difficulty to respondents. It would also speed up the processing. Since this survey is many times larger than the other four, a more streamlined procedure would save considerable amounts of time and money.

Some Concerns About the Surveys

Before each of the surveys is reviewed separately in this appendix, I raise some concerns that are common when these five surveys are looked at as a package.

One concern is that of providing prior-year data to respondents. On the industry survey, prior-year data are printed on the form. For the other surveys, prior-year data are available to respondents. Although many survey researchers believe that having prior-year data improves the quality of response by prompting respondents, others believe that it gives an easy way out. Research is needed on this issue.

Cutoff amounts are often used to define the survey universe. For example, academic institutions with less than $150,000 in R&D expenditures in the previous year

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

may be excluded from the universe. However, the cutoffs are not the same from one survey to another, which may make comparisons difficult.

There are differences in what types of universities or colleges are to be included and differences in fields of science to be included. Some surveys include the service academies, but not all. Some include the social sciences, but not all.

The role of the data provider is handled very differently in the five surveys. In each survey, NSF is dependent on a person to provide the data. In many cases, no single person can do so alone; others must be involved. In the survey of Science and Engineering Research Facilities, a person designated as an institutional coordinator is recognized to be crucial to the success of the survey. It is the institutional coordinator who oversees the entire effort and decides what other people need to provide, what sources of data to use, who coordinates the entire effort, then reviews the data. At the other extreme, the Survey of Industrial Research and Development is mailed to a company, with no contact person designated and no overt recognition of the possible difficulty of responding. Since the role of the data provider is so important, research is needed to identify the best respondent and ways to recognize the role of the respondent.

The surveys have very different approaches to nonresponse and imputation. Some try to eliminate it completely; others do some imputation. The Survey of Industrial Research and Development does not do a good job of reporting nonresponse, especially item nonresponse, but it handles imputation as a straightforward, statistical procedure that occurs in surveys. A thorough study of nonresponse reporting, both item and unit, and imputation would improve all the surveys and enhance comparability.

Finally, the methodological reports from the different surveys vary in their completeness. Some give plenty of detail; in others, it is difficult to find out who the respondents are and what the frame is. An outline of what is to be included and methods for ensuring adherence would be very helpful to persons wanting to learn about the surveys and to compare them.

SURVEY OF FEDERAL FUNDS FOR RESEARCH AND DEVELOPMENT

Introduction

Objectives

The Survey of Federal Funds for Research and Development is the primary source of information about federal funding for R&D in the United States. Government, academia, and the science community use the results of the study. The survey is also used for budget purposes.

Specifications

The survey focuses on federal support of national scientific activities in terms of budget obligations and outlays. For each year, survey data are to be provided for three fiscal years: the fiscal year just passed, the current fiscal year, and the president’s budget year. Actual data are collected for the year just completed, while estimates are obtained for the current year and the budget year.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

Agencies are asked to submit the same survey data as the budget figures submitted to OMB in January. The reporting unit for the survey is the subagency or agency to whom the survey materials are sent.

The reporting forms are provided to agencies by a web-based data collection system developed by ORC Macro. The agencies are asked to send back the completed form by April 15. All processing and editing is done electronically.

The survey is sponsored and funded by NSF and is carried out under contract by ORC Macro.

Survey Design and Implementation

Scope of Survey and Frame

The scope of the survey has changed over the years, reflecting the number of subagencies that now fund R&D, the kinds of places in which R&D is conducted, and the kinds of questions agencies have the resources to answer. Federal obligations for research to universities and colleges by agency and detailed science and engineering field were added to the survey in 1973. Federally Funded Research and Development Centers (FFRDCs) are also included. The Central Intelligence Agency and other security-related agencies are not included.

Not all agencies are asked to respond to all questions. Only the 10 largest R&D funding agencies are asked about the geographical distribution of obligations for R&D and R&D plant. R&D plant refers to the equipment and facilities where the R&D takes place. These 10 agencies account for about 97 percent of total R&D and R&D plant obligations each year. Only six agencies are asked to report on the distribution of R&D to universities by field of science and engineering and character of research.

NSF held several workshops to learn from respondents about any difficulties in reporting. As a result, NSF considered removing certain items from the survey instrument. A flier was distributed notifying data users that NSF was considering eliminating several items from future publications. Data users were asked for comments. NSF removed 54 tables that showed data on two items: data for the special foreign currency program and detailed field of science and engineering data for estimated out years. NSF also eliminated two tables showing data on foreign performers by region, country, and agency, but these were later reinstated.

The frame for the survey is the list of federal agencies that fund R&D, obtained from information in the president’s budget submitted to Congress. Agencies or subagencies that have reported R&D data in OMB budget documents are included.

For volume 50, in which actual data were collected for FY 2000 and estimates were collected for FY 2001 and FY 2002, the frame consisted of 29 federal agencies and 73 subagencies. All of these are included, so there is no sampling.

Potential Sources of Error in the Scope and Frame

Since there is no sampling, there are no sources of error in a sampling operation.

Coverage would be deficient if the frame did not include all the relevant federal agencies and subagencies that fund R&D activities. However, agencies are identified by using such sources as the president’s annual report, OMB budget documents, and respondent agencies. Coverage does not seem likely to be a problem.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

Data Collection

Basic Data Collection Procedure

The first step in the collection process is to identify the agencies, subagencies, and respondents within the agencies in order to prepare and deliver the mail-out packets for the new survey cycle. ORC Macro keeps a master list of all respondents by agency and subagency.

Changes to respondent information are obtained throughout the survey period either from NSF or from the respondents. The inclusion of agencies in the survey may change in response to changes in their R&D status. As changes are received, the master list of respondents is updated. NSF suggests names of additional agencies or subagencies that have reported R&D activities in OMB budget documents or in the media. The U.S. Government Manual available from the National Archives and Records Administration’s Web site at http://www.access.gpo.gov/nara/browse-gm-oo.html is used to verify agency names and to obtain phone numbers, if necessary.

At least two weeks before the mail-out packets are to be sent, ORC Macro staff call each of the respondent agencies to verify names, addresses, and phone numbers. They also collect the name and address of the respondent’s supervisor.

Since the survey is done entirely on the web, there is no actual mail-out of materials. ORC Macro sends a letter to respondents giving the URL and the agency’s respondent ID and password. It e-mails all materials to survey respondents. Follow-up procedures begin immediately after the e-mailing of the survey materials. ORC Macro calls or e-mails the respondents to ensure that they have received their packets. These contacts continue until ORC Macro is certain that all packets have been received by the respondents or are in the hands of the agency coordinator.

The Web-based data collection system is used for all data collection, data imports, data editing, and trend checks,. The system consists of a data collection component, which allows survey respondents to enter their data online, and a monitoring component, which allows ORC Macro to monitor support requests, data entry, and data issues. The two components are password-protected, so that only authorized respondents and ORC Macro staff can access them.

Agency respondents are given their respondent user IDs and passwords in their mail-out packets. Respondents are able to access the system to begin data entry around the first week of March.

The Respondents

The respondents to this survey are agency personnel assigned to complete the survey. Since the survey deals with budget obligations and outlays, the respondents tend to be primarily budget analysts.

When an agency also has subagencies in the survey, the agency respondent is to review the data for all subagencies to ensure consistency.

The Survey Instrument

The survey instrument takes full advantage of being electronic. All definitions and help screens are readily available to the respondent.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

The key variables reported on are:

  • Outlays for R&D and R&D plant

  • Obligations for R&D and R&D plant

  • Obligations for basic, applied, and total research, by field of science and engineering

  • Obligations for basic research, applied research, and development, by performer

  • Obligations to individual FFRDCs, FY 2000 only

  • Obligations for R&D to foreign performers, FY 2000 only

  • Obligations to R&D plant, by performer

  • Obligations for R&D by state, FY 2000 only

  • Obligations for R&D plant, by state, FY 2000 only

  • Obligations for R&D to universities and colleges, by field of science and engineering

Not all agencies are asked to complete all items. The first seven items are asked of all. The next two items are asked of 10 agencies, and the final item of 6 agencies. The Department of Defense (DoD) breaks out state obligations by research and development separately.

The question to be answered appears at the top of the screen. Directions immediately follow the question, followed by definitions and then a table in which to report the data. Often there is a note that explains that certain amounts should total other amounts reported elsewhere. An attachment lists examples of disciplines included in the broad NSF fields of science and engineering. Unlike some other R&D surveys, the social sciences are included.

Definitions of obligations and outlays are the same as those in the U.S. budget. Obligations represent the amount for orders placed, contracts awarded, services received, and similar transactions during a given period, regardless of when the funds were appropriated and when future payment of money is required. Outlays represent the amounts of checks issued and cash payments made during a given period, regardless of when the funds were appropriated or when the obligations were incurred.

In reporting obligations and outlays, each agency includes the amounts transferred to other agencies for R&D support. The receiving agencies do not report funds transferred to them.

Data are requested about the location, by state or outlying area, where the research takes place. If this information is not available, respondents are asked to assign the obligations to the state, outlying area, or office abroad where the headquarters of the U.S. primary contractor, grantee, or intramural agency is located. This could lead to some distortion of state estimates.

Nonresponse

The response rate for the survey has always been 100 percent. There is no known item nonresponse, since agency respondents must answer the questions before the data can be submitted.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

Several steps are taken to achieve on-time response. ORC Macro keeps a folder for each agency with any information about the response or respondents. Specific steps that are taken are:

  • Calling or e-mailing respondents immediately after mail-out to ensure that respondents have received their packets, reminding them of the due date, and encouraging them to call if there are any questions.

  • Calling respondents again as the due date approaches.

  • Asking respondents if provision of any data obtained during the previous cycle would be of assistance.

  • Conducting issues workshops to answer questions about definitions and reporting practices.

  • Providing the respondents with the name, address, and phone number of the agency respondent for the Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions, because many agencies cannot report until those data are available.

  • Talking to or sending a letter under NSF signature to the respondent’s supervisor to obtain assistance in making the reporting of the data a higher priority.

  • Asking NSF to intervene when a respondent continues to show reluctance to cooperate.

Potential Sources of Error in the Collection Procedure

Only limited measurement problems are discussed in the survey reports. Some agencies find it difficult to report certain items. Some agencies, such as DoD, do not include headquarters planning and administering of R&D programs in the full cost of R&D; these are not large costs, however. R&D plant data are also underreported because of difficulties encountered by some agencies, particularly DoD and the National Aeronautical and Space Administration (NASA), in identifying and reporting these data.

Large deviations of reported obligations from year to year are questioned in the online editing. Agencies must supply explanations. Thus, if an agency is misreporting, it is easier to continue misreporting. However, there have been no studies of reporting, so no data are available.

Data Processing

Editing

There is no need for keying or weighting of data, so the only processing step is the editing and the acceptance of the data by ORC Macro.

The progress of each agency in completing the survey is available online and can easily be reviewed by ORC Macro or NSF staff.

One key edit is a comparison with data reported in an earlier cycle. The initial report lists agency code, agency name, and number of differences for agencies that have reported a variance of $100 million in any data obligation category between the previous and current data collection cycles. The variance is then broken down by the table that is the source of the data variance, the specific data cell, a table code, and the current and

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

previous cycle obligation amounts. The agency must provide a narrative explanation, and this explanation is used to verify the difference. NSF reviews these cases.

Detectable data errors are flagged automatically by the Web-based system. Some flags ask respondents to enter an explanation. The system also flags possible errors and asks respondents to review the material. Errors marked with a red sign will block submission to NSF. Until respondents fill all required fields, the agency cannot submit data.

Data aggregated at the department level are checked to ensure accuracy, and a copy is delivered to NSF. ORC Macro developed a spreadsheet to be used as a guide for each person responsible for checking tables. It serves as a map to identify groups of tables to cross-check and to compare with similar totals generated from the database.

The methodology report (National Science Foundation, Volume 50, fiscal years 2000–2002 contains a list of edit checks that are done online as the respondent completes the survey form. Agency coordinators review subagency data and can make changes. Once all the program offices in an agency have completed final submission, the agency coordinator aggregates the data into one record and submits it to NSF.

Potential for Errors in Processing

The extensive online editing is beneficial in speeding up the processing and ensuring that all items are completed. There is no evidence that any errors come into the process from the edits.

SURVEY OF FEDERAL SCIENCE AND ENGINEERING SUPPORT TO UNIVERSITIES, COLLEGES, AND NONPROFIT INSTITUTIONS

Introduction

Objectives

The Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions is congressionally mandated and is the only source of comprehensive data on federal science and engineering support to individual academic and nonprofit institutions. Federal policy makers, state and local government officials, university policy analysts, R&D managers, and nonprofit institution administrators use it. NSF and other federal agencies also use it for internal administrative purposes.

Specifications

The survey focuses on federal support to universities, colleges, and nonprofit institutions. Obligations to each institution are to be reported. The request is for the same information that was submitted to OMB in January of the survey year. A survey instrument is to be filled out for each university and college for which an agency obligated R&D funding during the previous fiscal year. Also, separate forms are to be completed for each nonprofit institution for which there were obligations.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

The obligations to universities and colleges are to be listed by type of science and engineering activity, while the nonprofit obligations are to be listed by R&D and R&D plant (R&D facilities and fixed equipment).

The reporting forms are provided to agencies by a Web-based system. Respondents are asked to return the forms by April 15. All processing, editing, and tabulation are handled electronically.

The survey is sponsored and funded by NSF and is carried out by ORC Macro.

Survey Design

Scope of Survey and Frame

The data collection system for reporting federal obligations to universities and colleges was established in 1965. Since 1968, these data have been collected annually.

Beginning with the FY 1993 annual report, NSF ceased publication of data collected for nonscience and engineering support to universities and colleges.

DoD, since 1990, reports research obligations separately from development obligations. Beginning in 1994, development obligations are separated into two categories: advanced technology development and major systems development.

Since 1990 NSF has not published data on detailed field of science and engineering obligations in R&D or fellowships, traineeships, and training grants to academic institutions. DoD reports all detailed science and engineering data in “not elsewhere classified.” Issues workshops held by NSF confirmed suspicions that the detailed science and engineering data reported by other agencies were not reliable.

The U.S. service academies became respondents during FY 1998. NSF no longer collects data on FFRDCs.

NSF now allows agencies to report deobligations to the survey, but no negative obligations are published. All negative obligations are changed to zero.

The frame for the survey is the list of federal agencies that fund R&D, obtained from information in the president’s budget as submitted to Congress. The agencies that are the respondents to the Survey of Federal Funds for Research and Development are the respondents to this survey. For FY 2000, the target population was 18 federal agencies that incurred almost all of the obligations for federal academic R&D.

The number of agencies included in the survey can vary from year to year. ORC Macro examines the list of agencies reporting to the federal funds survey for academic and nonprofit data. Then an agency will be asked to report such data on the Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions. However, in the next fiscal year, there may be nothing to report.

Potential Sources of Error in the Scope and Frame

Since there is no sampling, there is no error from a sampling operation.

Coverage is considered excellent. However, NSF cautions users that not all federal agencies are surveyed, so that some funding could be missed. The omissions are believed to have little impact on total funding, but they could be significant for understanding the funding for some institutions.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

There may be some loss of coverage, particularly among nonprofit institutions, when the institutions do not get added to the list of “new” institutions in the codebook. However, any coverage loss by them is unknown.

Data Collection

Basic Data Collection Procedure

Information is collected for the federal fiscal year (October 1 through September 30). Data collection starts in February with a due date in May.

The first step in the collection process is to identify the agencies and respondents. Changes to respondent information are obtained throughout the survey cycle. The names of the agencies may change, or their inclusion in the survey may change as a result of whether or not they continue to provide science and engineering support to universities, colleges, or nonprofit institutions. ORC Macro keeps an updated list of agencies and respondents.

At least two weeks before the start of the survey, ORC Macro staff members call each of the responding agencies to verify the name, address, phone number, fax number, and e-mail address. In addition, the name and address of the respondent’s supervisor is requested. After all respondents have been contacted, a respondent list is generated and a copy sent to NSF. At the beginning of each data cycle, respondents are asked to confirm or update the respondent information online. They cannot submit their data until they do so.

Beginning with the 2001 survey, ORC Macro e-mailed all materials to survey respondents. ORC Macro calls or e-mails respondents to ensure that they have received their packets. Follow-up begins about two weeks after the respondents receive the packets if they have not yet logged on to the web-based system.

The Web-based data collection system is used for all data collection, data imports, data editing, and trend checks. The system consists of a data collection component, which allows survey respondents to enter their data online, and a monitoring component, which allows ORC Macro to monitor support requests, data entry, and data issues. The two components are password-protected, so that only authorized respondents and ORC Macro staff can access them.

The Respondents

The respondents for this survey are personnel in the federal agencies. Most of them work in budget offices.

The Survey Instrument

The survey instrument takes full advantage of being electronic. All definitions and help screens are readily available to the respondent.

The key variables are:

  • Academic institution

  • Geographic location (within the United States)

  • Highest degree granted

  • Historically black colleges and universities

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
  • Obligations

  • Performer (type of organization doing work)

  • R&D plant

  • Type of academic institution (historically black colleges and universities and others)

  • Type of activity (e.g. R&D; science and engineering instructional facilities)

  • Type of institutional control (public or private)

Data are collected at the level of funding agency and provided in aggregate form for over 1,000 individual academic institutions in the following categories:

  • Research and development

  • Fellowships, traineeships, and training grants

  • R&D plant and equipment

  • Facilities and equipment for instruction in science and engineering

  • General support of science and engineering

  • Other activities related to science and engineering

  • All other activities

Data on over 1,000 nonprofit institutions are also available.

The variables in this survey use definitions comparable to those used by OMB and the Survey of Federal Funds for Research and Development. Respondents are told that the totals reported in this survey and the federal funds survey for R&D and R&D plant obligations should be in close agreement. If differences exist, respondents should include an explanation.

Totals could differ because methods differ for reporting funds that are transferred to another agency before being distributed to institutions. In this survey, the agency that distributes the funds directly to the institution is responsible for reporting those obligations. Thus, agencies included in this survey would report funds received from other agencies but would exclude funds transferred to another agency. For the federal funds survey, the obligations are reported by the original source of funds.

Nonresponse

There is no nonresponse from agencies. There is no known item nonresponse, since agencies must answer the questions before they can submit data to NSF.

The survey instrument provides many definitions if agency respondents care to use them. Agencies may not be able to provide the exact information desired. For example, federal agencies may not be able to identify the branch of a university system to which funds go. Sometimes, there are problems in determining the correct institutional code to be assigned to a university. Such coding errors lead to errors in the estimates of funding for the institutions affected. No data are available on the extent to which federal agency respondents check their codes with universities.

Other problems discussed in the methodology report (National Science Foundation, 2001) include agency difficulties in matching program descriptions to the proper funding category. NASA has placed increased emphasis on including educational

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

components to projects, and education is always reported as “other S&E.” NSF is aware that categories of “general support for S&E” and “other S&E activities” are catchall categories.

Potential Sources of Error in the Collection Procedure

The main source of error in the collection procedure would be from the respondents’ reporting of information. Issues workshops are held to bring out difficulties that agencies have in reporting, and NSF makes adjustments. However, there is no detailed knowledge on whether or not respondents report fully, if they understand the definitions and use them correctly, or if the editing eliminates the problems.

Data Processing

There is no keying or weighting of data. The only processing step is editing.

For FY 2000, each agency sent data directly to the contractor. Data were received in several ways: ASCII format via e-mail, paper copy, and ASCII format created by use of the FED Web system. Paper copy was entered into the correct ASCII format by use of the web-based system by the contractor.

The Web-based system checks data entered by respondents every time they save their data. Large increases or decreases in funding from the previous year are flagged, and the system provides text boxes for respondents to explain discrepancies. The system will not allow respondents to submit incomplete data, and it eliminates the need to check fiscal years and submission format. Many agencies offer no explanation when the program asks for one. ORC Macro or NSF or both will try to get an explanation from the respondent. Sometimes, an explanation will lead to a data adjustment. In the FY 2000 survey, the web-based system was used to generate edit packets for all respondents after they completed their submissions. The edit packets included a summary report, a detailed two-year trend report, and a report on trend discrepancies requiring an explanation. After review by ORC Macro and NSF, respondents were contacted to provide verifications and explanations for institutions with large increases or decreases.

All agencies reporting obligations to institutions that are not listed in the codebook can add new institutions to the codebook online. These institutions are checked by ORC Macro and submitted to NSF for approval.

Potential Errors in Processing

The extensive online editing process is beneficial in speeding up the processing. There is no evidence that any errors are introduced into the data from the edits.

SURVEY OF RESEARCH AND DEVELOPMENT EXPENDITURES AT UNIVERSITIES AND COLLEGES

Introduction

Objectives
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

The Survey of Research and Development Expenditures at Universities and Colleges is the primary source of information on separately budgeted R&D expenditures in academia in the United States and outlying areas. The results are used to assess trends in R&D expenditures.

Specifications

The annual survey is of academic institutions and consists of the 500 to 700 universities and colleges that have doctoral programs in science and engineering (S&E) fields or annually performed at least $150,000 in separately budgeted R&D. These institutions have traditionally expended more than 95 percent of U.S. academic R&D funds. In addition, the survey population includes all Federally Funded Research and Development Centers that are academically administered and engaged in basic or applied research, development, or management of R&D activities. Also, all historically black colleges and universities that perform any separately budgeted R&D in science and engineering are included.

The reporting unit for this survey is the academic institution. The reporting forms are sent in November and are to be returned in January.

This survey is sponsored and funded by NSF and is carried out under contract by ORC Macro.

Survey Design

Scope of Survey and Frame

The frame for the survey is developed from a variety of sources. Not all academic institutions are to be included, so a procedure has been developed as described in the processing and editing manual (National Science Foundation, 2002c) for this survey. By using the NSF-NIH Survey of Graduate Students and Postdoctorates in Science and Engineering (graduate student survey) and the NSF Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions populations, institutions with a highest degree-granting status of 1 (doctorate-granting) are compared to make sure all science and engineering doctoral degree-granting institutions are included in the academic R&D expenditures survey population. This task is done annually. Sources consulted are annual issues of the Higher Education Directory published annually by Higher Education Publications, Inc.. and direct contact with universities.

NSF maintains a list of all Federally Funded Research and Development Centers. The Department of Education maintains a list of all historically black colleges and universities.

ORC then extracts a list of science and engineering doctoral degree-granting institutions, FFRDCs, and historically black colleges and universities that reported zero R&D expenditures in the previous year. The year and expenditure amount reported the last time these institutions reported expenditures greater than zero are also given. Also, ORC cites the federal R&D obligations reported to these institutions for the past five years.

ORC telephones each science and engineering doctoral degree-granting institution, FFRDC, and historically black college or university that reported zero

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

expenditures in the previous fiscal year to find out if they have expenditures data to report for the current year. If there are expenditures to report, ORC sends the institution a survey packet. If the institution says they have zero expenditures, ORC enters zero for that institution. If the institution says zero but the federal R&D obligations data suggest otherwise, ORC and NSF discuss this with the institution and attempt to obtain actual expenditures.

Science and engineering master’s and bachelors degree-granting institutions that reported $150,000 or more in science and engineering R&D expenditures in the previous fiscal year are surveyed again. To determine which other institutions should be included in the current fiscal-year survey, ORC creates a list of all other science and engineering master’s and bachelors degree-granting institutions in the academic R&D expenditures survey universe that received federal obligations during any cycle from the previous five years. Institutions are included in the current cycle if they meet either of the following criteria:

  • The institution reported cumulative expenditures for the last three full population surveys of $250,000 or more.

  • Federal agencies reported cumulative obligations of more than $400,000 for the fiscal year plus the four previous years.

A total of 36 academically administered FFRDCs were included; 4 FFRDCs administered by industrial organizations and 16 administered by nonprofit organizations were surveyed but not included in publications.

Potential Sources of Error in the Survey Design

There is no sampling, since all academic institutions that meet the criteria are in the survey. The potential problems are with the frame itself and the application of the criteria.

If the frame fails to identify an academic institution with at least $150,000 in separately budgeted R&D, there would be a gap. The historically black colleges and universities are well identified, so they do not pose any coverage loss; similarly, FFRDCs are well known, as are the doctorate-granting universities. So it is the application of the dollar limit that could pose problems. NSF describes the only other gap as the four 2-year degree-granting institutions that accounted for less than 0.01 of the total R&D expenditures.

Data Collection

Basic Data Collection Procedure

Since FY 2001, the survey was delivered to respondents electronically rather than in paper format. ORC Macro was responsible for drafting the initial e-mail message, electronic survey form, and follow-up e-mail messages. ORC Macro sent the messages to each respondent and included the survey web address and the university ID and password. An explanation of the survey and survey population and information on downloading a paper copy of the survey form from the survey web site was included. An acknowledgment of receipt and expected completion date were requested. The web-based data collection system also made available the following:

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
  • Current fiscal year academic R&D expenditures survey form

  • A cross-walk between NSF fields of science and engineering and the Classification of Instructional Programs of the National Center for Education Statistics

  • Facsimile data from the three previous years

  • A question and answer booklet

Respondents who did not acknowledge receipt of the survey were telephoned. Completed questionnaires received in the mail were stamped with the date received and then submitted online by ORC Macro staff. Other questionnaires were received electronically. ORC Macro reviewed each submission for errors. Starting in mid-February, a report was generated weekly that provided the following information on each institution: top 100 ranking, by R&D expenditures, type of institution, highest degree granted, type of control, state, stratum code, date of the e-mail message receipt, expected and actual date of survey questionnaire receipt, and any change in status during the latest week.

A nonrespondent activity log was generated in mid-February, and a follow-up e-mail was sent to institutions that had not acknowledged receipt of the survey e-mail. The first round of follow-up telephone calls began two weeks after the follow-up e-mail messages were sent. In cases in which information was not received, the e-mail address was updated and the information was resent or a copy of the survey was faxed to the respondent. If the previous respondent was no longer available, a new respondent was identified and the information resent. Institutions were telephoned repeatedly until acknowledgement was obtained.

Activity log reports were generated for institutions that had not completed the survey online or by mail by the survey due date. Beginning in mid-April, a second round of follow-up calls was made to emphasize the need to respond. Additional emphasis was placed on obtaining responses from the top 100 institutions.

Respondents who needed more time were offered an extension. Reluctant respondents were reminded of the national scope of the survey and its consideration by decision-making bodies, including Congress. In mid-July, respondents who had not yet reported were asked to provide data only for items 1 and 2 (current R&D expenditures in science and engineering, by source of funds and by field of science and engineering). Total figures were asked for as a last resort. Respondents who still refused were thanked for their time and informed that NSF personnel might contact them in the future.

The Survey Respondent

Nothing was included in the methodology report (National Science Foundation, 2002b) about the kinds of offices in which the respondents work or the titles of the respondents.

The Questionnaire

No cognitive work has been done on the questionnaire. NSF does have workshops with respondents to identify items that cause respondents difficulty.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

It would seem important to visit a group of respondents representing different types of academic institutions to get a better understanding of:

  • How is it decided, in a large, complex institution, who is the respondent and how does the respondent gather the data?

  • How many people need to be involved to answer the questions?

  • How institutions interpret questions and definitions.

The key variables in this survey are:

  • Academic institution

  • Character of work (basic research, applied research or development)

  • Equipment expenditures

  • Expenditures for science and engineering R&D

  • Federally funded R&D centers

  • Field of science or engineering

  • Geographic location (within the United States)

  • Highest degree granted

  • Source of funds (federal, state or local; industry, institutional, or other)

  • Type of academic institution (doctorate-granting or other)

Nonresponse

By the closing date of the 2002 survey in October 2002, completed questionnaires had been received from 581 of the 610 academic institutions, a response rate of 95 percent. All 100 top institutions were included. The response rate for the FFRDCs was 100 percent.

All missing data items, including those for the nonrespondents, were imputed. No mention was made in the methodology report (National Science Foundation, 2002b) of item nonresponse rates, although data for missing items was imputed.

Potential Sources of Error in the Data Collection Procedure

Little is known about the mechanics of response, including the reaction to the survey, the difficulty of responding, how many people are involved in responding, or how the questions are interpreted. A cognitive study could be useful. Visits are made to institutions and information from those visits has been helpful.

The form itself, at least the first page, looks very busy and suffers from lack of good graphic design. A few questions are interspersed with directions, definitions, and reminders.

There is a mixture of items on the questionnaire, with items 2A and 2B designated as optional. A cross-walk between NSF nonscience and engineering fields to help respondents to report disciplines under the questionnaire fields appears on the form, yet the cross-walk is mixed in at the end of the questions. Instructions are sandwiched together, with instructions for items 1 and 2 followed by instructions for items 1A and 1B. It may be different on the web form, but paper forms are still used by some respondents. A cleaner design could be of help.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

Data Processing

Editing

A paper questionnaire requires more handling by ORC Macro. An analyst reviews all paper forms for data entry. When questionnaires contain incomplete data, the analyst decides whether or not imputation is required. Cells requiring imputation are marked with a “B” status code. When respondents report in dollars instead of thousands of dollars, the analyst revises these figures in red. Respondents occasionally reported negative numbers (credits to accounts), which were acceptable only for the question on expenditures for the purchase of research equipment by field of science and engineering. If negative amounts were reported for other items, the data are adjusted proportionately to correct the error without altering the subtotal or total figures provided. The revised figures were coded in red on the hard copy questionnaire and assigned an “E” status code to indicate that they are estimates.

Questionnaires received online are processed electronically. This includes any respondent or institutional revisions, error flags, imputation flags, and status changes. An analyst verifies any discrepancies.

An institutional response code changes the status of institutions from “awaiting response” to “waiting processing.” After data entry, the institution response code becomes one of the following:

  • Awaiting correction: arithmetical error requiring correction by sending an “edit letter.”

  • Awaiting verification: significant trends in data that require confirmation by sending an edit letter.

  • Awaiting imputation: partial data requiring computerized estimation.

  • Clean: no data correction, verification, or imputation needed.

Almost all corrections and verification were handled over the telephone or by e-mail.

In the edit letters, sent by e-mail, respondents were given access to the web site to allow them to view their submissions and correct their data on the web.

Prior to sending the edit letters, all edit questions were reviewed by a ORC Macro analyst. Error messages and large changes from previous reports, called trend changes, were also reviewed. The analyst corrected or verified the data for institutions that had minor errors or trend warnings and marked any altered data with an “E” status code to indicate they were estimates. In these cases, the institutional respondents were often contacted by phone, informed of the changes, and allowed an opportunity to adjust the data.

Respondents who did not respond to the edit letter within two weeks were contacted by telephone to resolve data errors and explain trend warnings.

To ensure quality control, preliminary ranking tables were produced for NSF to review several weeks before survey closeout. These tables provided several years of data for individual institutions, by sources of funding and major fields of science and engineering. This was an opportunity to check for inconsistencies and unusual trend data.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

The NSF project officer visited ORC Macro two weeks before closeout to review any major trend anomalies. Any institution that had not returned an edit letter or was still awaiting correction or verification was reviewed. NSF and ORC Macro developed estimates, marked with an “E” status code, to correct or verify these institutions’ data.

Imputation

Imputation was used in 2001 to provide information for the 4 percent of the survey population that did not respond to the survey at all and for any item nonresponse. Imputation rates for all data cells were calculated for all universities and colleges and for various institution classes, determined by highest degree granted and type of control. The “psychology” field received the lowest imputation rate, 0.3 percent, of all major science and engineering fields, while the highest rate was 4.8 percent for “other sciences.” For the sources of funding category of academic institutions, the lowest imputation rate was 0.3 percent for all state and local government sources and the highest was 0.6 percent for all federal government sources.

For most items, imputation factors were generated by certain classes of institutions, defined by highest degree granted and type of control. These factors were derived from responding institutions for three key variables: total R&D expenditures, federally financed R&D expenditures, and total research equipment expenditures. Ratios or inflation factors were then applied to the previous year’s key variables for each nonrespondent institution to derive a current-year estimate. These factors, when applied to institutions in each class, reflected the average annual growth or decline in expenditures for reporting institutions in that class. The key variables were then distributed among the various subtotals and detailed fields using the same relative percentages that were reported by that institution. If no previous percentages were available for an institution, the summary percentages for the institution’s class were used.

No mention is made in the methodology report of the use of prior-year information that was itself imputed or for how many years imputed data could be used.

In previous years, the basic research variable was imputed using the same methodology. In FY 2001, this methodology was found not to provide accurate data and a new method was developed. NSF received revisions of a large university’s basic research that had been imputed using the same rate for 15 consecutive years. Applying the corrected shares to the FY 2000 survey resulted in an increase in aggregate federal basic research dollars of $229 million and in aggregate total basic research of $566 million.

A memo by Brandon Shackelford of NSF gives a description of the research done to improve the imputation National Science Foundation, no date The survey items pertaining to basic research had a response rate of 83 percent in FY 2000 compared with a unit response rate of 98 percent. This led to an investigation of respondent microdata from the previous 15 years and the subsequent development of a regression model.

The first step was creating a table of records consisting of each year’s responses to select survey items (such as basic research percentage), the prior year’s responses to the same items, and the difference between the responses for the two years. Two assumptions formed the guidelines on which responses were flagged as anomalies:

  • Academic research is primarily basic in nature.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
  • The composition of research at an institution does not change rapidly.

Particular attention was paid to large swings in basic research shares that were either preceded or followed by a period of imputation. This exercise, along with data ORC Macro provided on universities with long consecutive periods of imputed basic research, identified a number of respondents that could be contacted to clarify their prior basic research responses or nonresponses. Respondents declared an inability to produce an exact figure for basic research. Some did not understand the concept, and some were unfamiliar with the research at their universities. They agreed that an imputation by NSF would be at least as good as any estimate they could produce.

Those respondents with annually consistent basic research reporting either had a way to code and extract basic research from their accounting systems or made reasoned estimates based on knowledge about the character of the research at their institution.

A regression analysis was developed as a model for imputation. Modeling basic research percentages had a number of drawbacks. Modeling the basic research amounts as the dependent variable had the dual benefit of producing much better fitting estimates as well as allowing for the creation of model specifications where the coefficients had some intuitive meaning.

For federal basic research, one of the most parsimonious models included federal engineering and nonengineering R&D as well as a dummy variable indicating whether or not the institution was a public institution. The model that fit well was

Federal basic research=

β1 * federal engineering R&D

+ β2 * federal nonengineering R&D

+ β3 * public federal engineering R&D

+ β4 * public federal nonengineering

+ e

The coefficients can be interpreted loosely as the percentage of each component that is basic research.

The model for nonfederal basic research was as follows:

Nonfederal basic research=

β1 * industry R&D

+ β2 * nonindustry R&D

+ β3 * public industry R&D

+ β4 * public nonindustry R&D

+ e

A benefit of the model specifications is that large R&D universities are implicitly given more weight when the coefficients are estimated. Two drawbacks are that they exhibit heteroscedasticity and that outliers have excessive influence on the observations. Several institutions were omitted from the analysis because they were found to be outliers.

Federal basic research amounts and total basic research amounts were imputed for all non-FFRDC institutions that met the following criteria:

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
  • Had been previously imputed for FY 1998 through FY 2001

  • Reported the federal basic research amount as zero and the federal total expenditures were not zero

  • Reported the total basic research amount as zero and the total expenditures were not zero

A significant proportion of institutions are intermittent reporters. In FY 2001, data were retroimputed for 20 institutions. Data for the years in which no response was received were imputed. For each institution, key variables that were formerly imputed were compared with subsequent submissions to determine whether the imputed data accurately represented the growth patterns shown by the reported data. In FY 2001, retroimputation was applied to total R&D data of 8 institutions, federal R&D data of 10 institutions, and total equipment R&D of 19 institutions. Retroimputation was done when the originally imputed data were not consistent with the reported data.

Possible Errors in Processing

Editing procedures for paper questionnaires differ from those of electronic questionnaires. For paper, an analyst makes decisions on whether or not imputation is needed. Analysts also make corrections. Online questionnaires have all editing done electronically. Even although the two procedures may be designed to be identical, merely having two procedures gives room for error. It would be useful to test this.

Do the paper questionnaires get more imputation? More editing? Are there more “E” codes? It would be helpful to users of data to know what percentage of the cases and the totals were imputed. Although data are collected and flagged with “E” for estimates and “B” for imputation, these rates are not given in the methodology report.

The imputation research on basic research was a good first step but should be continued. There are flaws in the model. Also, when the model was developed, it sounds as if all the data were used. Usually, a subset of data is held out to test the model. Having a set of independent data on which to test the model exhibits the difference between building a model to summarize the data in hand and building a model to predict what may be observed in a new cycle.

The imputation finding on basic research illustrates the danger of using imputed data for long periods as the basis for new imputes. Other methods of imputing for other items is an additional item for a research agenda.

SURVEY OF SCIENCE AND ENGINEERING RESEARCH FACILITIES

Introduction

Objectives

The main objective of the Survey of Science and Engineering Research Facilities is to provide data on the status of R&D research facilities at research-performing colleges and universities, nonprofit biomedical research organizations, and independent research

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

hospitals in the United States that receive funding from the National Institutes of Health. This survey was mandated by Congress in 1985. It is used by planners in the federal government, state governments, universities, and private-sector firms.

Specifications

This is an establishment survey, of either an academic institution or a biomedical institution. Not all academic institutions are included. Only those granting master’s or doctoral degrees in science and engineering or other institutions that reported R&D expenditures of $150,000 or more, as well as all historically black colleges and universities reporting any R&D expenditures, are included. The six U.S. service academies are not included. Biomedical institutions include nonprofit biomedical research organizations and independent hospitals that received at least $150,000 in extramural research funding from NIH in the previous fiscal year.

The survey cycle began in 2001 in April and ended in August. In 2003, there will be a Part 2 to the survey, focusing on computing and networking capacity.

The survey is sponsored and funded by NSF and was carried out by ORC Macro.

Survey Design

Scope of Survey and Frame

The facilities survey has been conducted biennially since 1986, becoming a census of the eligible population in 1999.

In 1999, the universe consisted of all academic institutions that granted master’s or doctoral degrees in science and engineering fields, and all other academic institutions that reported separately budgeted science and engineering R&D expenditures of $50,000 or more. All historically black colleges and universities reporting any R&D expenditures were included. In 2001, master’s degree-granting institutions were not automatically included. Also, the threshold was raised to $150,000. For 2003, the threshold was raised to $1 million R&D expenditures in the prior fiscal year.

The universe also includes nonprofit biomedical research organizations and independent hospitals that received at least $150,000 in research funding from NIH in 1999. For the 2003 survey, the level of funding from NIH was $1 million.

The frame for the survey is defined by the academic R&D expenditures for the academic institutions and an NIH list of nonprofit research institutions and independent hospitals that received funding from NIH.

There is no sampling for this survey. In 2001, the survey universe consisted of 580 colleges and universities including stand-alone medical schools and 245 biomedical research organizations and research hospitals.

Potential Sources of Error in the Survey Design

The definition of the universe depends on the Survey of Research and Development Expenditures at Universities and Colleges, at least for the academic sector. To the extent that there are errors in the reporting or imputation of R&D expenditures, there could be errors in defining which colleges and universities had expenditures that met the threshold definition.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

For the biomedical institutions, there could be difficulty in ascertaining whether or not an institution was affiliated with another institution, such as a university. In such a case there would be double counting.

Data Collection

Basic Data Collection Procedure

Data collection proceeds in three distinct steps. First, the president or equivalent of each institution is sent a letter signed by the director of NSF or the director of the National Center for Research Resources (NCRR) at NIH. This letter explains the survey and asks the president to name an appropriate person as the institutional coordinator (IC). A letter is also sent to the IC who had participated in the previous survey cycle, to notify him or her of the upcoming survey and the letter to the president. For the 2000 survey, these letters were mailed in April. Presidents were asked to respond by fax, mail, e-mail, or phone with the name and contact information for an IC. After one week, phone calls were made to the president’s office to confirm that the package had been received and to see if there were questions.

Nearly 4,000 phone calls were made to obtain IC names. By the end of June, 99 percent of the academic institutions and 95 percent of the biomedical institutions had named ICs.

The second phase began with a mail-out to 262 institutions on April 24 and to another 118 organizations on April 27. The packages to the ICs contained a cover letter, an acknowledgment postcard to be mailed back, an overview of the survey, a paper copy of the questionnaire with a prepaid envelope, a list of frequently asked questions, and instructions for accessing the survey web site, with a user ID and a password. ORC Macro maintained contact with the ICs about every other week.

The third phase consisted of following up with the ICs. This meant reminding them of due dates, answering questions, explaining how to use the web-based system, and so forth. About 75 percent of the ICs responded online.

The Survey Respondents

The institutional coordinator was crucial to the success of the survey. Each IC had to determine the most effective data collection approach, cooperate with other staff members to obtain information from different internal sources, and review all data before submission. ICs are not necessarily located in a single type of office.

The Questionnaire

The questionnaire for this survey has undergone significant changes in the last two survey cycles. For the 2001 survey, only two questions were asked. A paper and a web-based survey were both prepared for the 2001 survey asking for the amount of science and engineering research space and the adequacy of the space. Don Dillman, an expert on questionnaire design who suggested rewording of instructions and questions for greater clarity and some reformatting, reviewed the survey form. Four major modifications were made:

  1. The description of what was included in “research space” was expanded to take care of some incompleteness in the description.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
  1. Redundant information from the instructions and information pages was removed.

  2. A choice of “not applicable (NA)” was added to item 1, which asked for the amount of net assignable square footage used for instruction and research space, by science and engineering field (1a) and amount of the total research space for all S&E fields that was leased (1b). If a respondent answered “zero” to 1a, it was necessary to make “NA” a possible response for 1b.

  3. The non-S&E instruction figure was broken out into a subquestion (1c).

The survey questionnaire also allowed respondents to indicate whether their previous survey cycle data were still accurate for the current year. If so, ORC Macro staff entered it in for them on the paper questionnaire.

The web-based questionnaire closely resembled the paper version, but features were added. For institutions that responded in 1999, the web-based questionnaire included a “preload” button for each item. This button loaded data from the previous cycle into the correct fields. Respondents could easily load that data. The web-based instrument also detected certain errors automatically. An extensive help menu was available. For 2003, the questionnaire was completely redesigned. The new questionnaire asks about:

  • The amount of space used for S&E research and how much of the space was for laboratories or offices

  • Condition of the research facilities

  • Costs of repairs and renovations in the last two years

  • New construction in the previous two years, with a project worksheet to be completed for each individual project

  • Source of project funding for repairs and renovations and new construction

  • Planned repairs, renovations, and new construction for the next two years

  • Deferred repairs, renovations, and new construction

A Part 2 of the questionnaire contains new questions on computing and networking capacity. Questions asked are:

  • The physical infrastructure used for network communications

  • Plans for future upgrading and uses of information technology

  • Capacity for high-speed computations

  • Infrastructure for wireless communication

Survey Nonresponse

Response rates were 90 percent for the academic institutions and 88 percent for the biomedical institutions. The top 100 (in terms of R&D expenditures) among the academic institutions had a 96 percent response rate; the private colleges had an 86 percent response rate. Hospitals had an 86 percent response rate, while nonhospitals had a 90 percent response rate. Generally, institutions with larger R&D expenditures had higher response rates. Also, previous respondents had higher response rates.

There have been no studies of nonresponse. However, NSF staff has hypothesized that the reasons for the difference in response rates between public and private institutions may be due to different traditions. Public institutions are accustomed

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

to having their data public. Private schools do not have that tradition. The reluctance of private schools to share data was discussed at the January 2002 expert panel meeting on the redesign.

A hypothesis for the lower response rates for biomedical institutions is that these organizations are less likely than universities to have already collected these data in some form for a purpose other that responding to the survey. Also, these institutions are generally smaller than academic institutions and may have received only one small grant from NIH. Thus, it is more of a burden to respond. Hospitals may not perceive the research grant to be as directly related to their main mission.

The redesign for 2003 should improve response. First, to be eligible for the survey, the amount of a grant from NIH must be $1 million or more. A research facility is not as likely to overlook a grant of this size. The same thing is true of academic institutions where R&D expenditures of $1 million or more are required for eligibility.

A second factor is that most of the survey data, except for a few sensitive questions, will be made publicly available with institutional identifiers. Institutions may be more likely to respond if they feel that the data will be useful to them.

No item nonresponse was reported in the methodology report for 2001 (National Science Foundation, 2002a).

Potential Sources of Error in the Collection Procedure

Respondents can respond by paper or on the web. Data processing is different for each method. Even although all processing was done under the supervision of one person, differences can exist and cause differences in results.

The use of a check box button allowing respondents to enter data from the previous year is sure to be seen as an easy way out for some respondents who do not wish to take the time to fill out the questionnaire. NSF did not retain information on the number of respondents who used the check box. However, they did examine the number of institutions that reported the same figures for research space in both 1999 and 2001. For academic institutions, it was 38 percent and for biomedical institutions, it was 47 percent. The check box will not be offered in future cycles.

Because the questions on computer technology are new, it would be useful to set up a system to register difficulties. Letting people choose item nonresponse may be a way of ascertaining what items they have difficulty answering. Because big institutions may have more sophisticated systems than small ones, it may be useful to set up a tabulation by size or complexity of institution to see which items are left blank by type of institution. There are undoubtedly many other editing schemes that would enable NSF to learn a great deal about these new questions.

Data Processing

Editing Data

When a paper questionnaire was sent in, the data were entered on computers using the same web-based system used by web respondents. However, the web instrument contained some embedded edits under which erroneous data could not be entered. Data entry personnel filled in missing totals or subtotals that could be derived from the provided data. If missing data could not be derived, the computer flagged the institution

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

for data correction. The survey respondent was then asked by e-mail or phone to correct the data.

When survey data were entered online, all processing and editing were done electronically. All data were checked for arithmetical errors and data inconsistencies. This allowed the institutional coordinator to edit any errors and clarify any inconsistencies during the submission process and ensured that all errors were corrected prior to final submission. The web system did not allow institutions to change IC information. If this was needed, the IC contacted ORC Macro, and contractor personnel entered new information into the system.

Data validation was ongoing and consisted of various checks on items 1 and 2 as well as cross-checks between the two items.

Imputation

Data were imputed for the individual institutions by a variety of methods. The first method used was to impute the data from the previous survey cycle. If an institution had not reported in the previous cycle, other methods were used.

For total amount of science and engineering research space, a regression approach was used. For academic institutions, total R&D expenditures in FY 2000 and total R&D expenditures for the agricultural sciences in FY 2000 were used as independent variables. For biomedical institutions, the independent variable was the total amount of support received from NIH in FY 2000.

The other items imputed for academic institutions were:

  • Total amount of S&E instruction space

  • Total amount of non-S&E instruction space

  • Amount of S&E research space in individual fields of science

  • Amount of S&E instructional space in individual fields of science

These items were imputed by forming totals for the respondents and then ratios, such as the ratio of total S&E instruction space to total S&E research space, and then applying the ratio to the already-imputed total amount of research space.

For biomedical institutions, the only other item imputed was the amount of S&E research space in individual fields of biological and medical sciences. Again, ratios were formed on the basis of the respondents and applied to the already imputed total S&E research space.

Other items were left blank and were treated as zero. These methods are described in the methodology report (National Science Foundation, 2002a).

Imputation was needed for 10 percent of the institutions. The national estimates for the unimputed variables were likely to be underestimates because zero had been imputed implicitly.

Potential Sources of Errors in Processing

Having two different procedures for survey response—paper and the web-based system—leaves a potential area for a difference in processing. It would be useful to compare the survey totals of the paper and web respondents on a subset of items,

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

preferably by size of institution. Also, rates of imputation and other editing changes are of interest in comparing methods.

Imputing for unit nonresponse is unusual. In most surveys, unit nonresponse is handled by weighting, as it was in this survey in 1999. It would be useful to compare the results of weighting and imputation. At a minimum, one of the benefits of weighting would be that there would be no blank items remaining.

The procedure of not allowing responses to be submitted until the edits are satisfied makes the processing easier. However, it forces the respondent to enter data for which he or she may have doubts. Many times, it merely causes respondents to check their data and correct arithmetic and inconsistencies. Other times, it may force answers that may be incorrect. It is, in a way, asking the respondent to impute data. NSF may lose some valuable insights if they force respondents to answer the new questions on computer infrastructure.

Weighting and Estimation

There was none.

SURVEY OF INDUSTRIAL RESEARCH AND DEVELOPMENT

Introduction

Objectives

The main objective of the Survey of Industrial Research and Development is to provide data on R&D performed by industry in the United States. The results of the survey are used to assess trends in R&D expenditures and are used by government agencies, corporations, research organizations, and individual researchers.

Specifications

This annual survey focuses on expenditures arising from R&D conducted in the United States. The survey is company-based and the universe includes all companies doing business in the United States.

The reporting unit for the survey is the company, defined as a business organization of one or more establishments under common ownership or control. The survey includes publicly traded and privately owned, nonfarm business firms in all sectors of the U.S. economy. It does not include operations owned by federal, state, or local governments, nonprofit organizations, or trust and pension plans.

R&D includes activities carried out by persons trained, either formally or by experience, in the physical sciences, the biological sciences, engineering, and computer science. Social science R&D is excluded.

The reporting forms are mailed to companies in March and the companies are asked to report in 30 or 60 days, depending on the type of form sent to them. Processing starts as the forms are received, but publication of results comes about two years later.

The survey is sponsored and funded by NSF and is carried out by the U.S. Census Bureau.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

Survey Design

Scope of Survey and Frame

The scope of the survey has changed somewhat over the years, mirroring changes in the nature of R&D and where it is conducted in the United States. A Census Bureau report by G.L.Kusch and W.Ricciardi is very helpful in understanding these changes (U.S. Census Bureau, 1994a). The survey has always included manufacturing and most nonagricultural industries. From its earliest years, it excluded trade associations, railroad industries, and agricultural cooperatives. Over the years, it also had policies about the size of companies that were in scope, for years excluding all manufacturing companies with 50 or fewer employees. In later years, manufacturing industries assumed to have little or no R&D were excluded, but uneasiness about the extent of R&D in smaller industries led to their inclusion in later years. Variable cutoffs based on number of employees were used for many years to reduce scope. At other times, cutoffs were reinstated. Single units with fewer than five employees were eliminated from scope.

At the present time, the scope of the survey is all for-profit companies classified in nonfarm businesses. There is no size criterion, except for the exclusion within single units of companies with fewer than five employees.

The sampling frame for the survey in its earliest days was a file from the Board of Old Age and Survivors’ Insurance (BOASI). These files had industry codes from a Standard Industrial Classification (SIC). For several years, the sample for the manufacturing industries was selected from the Annual Survey of Manufactures (ASM) carried out by the Census Bureau. The BOASI records still constituted the frame for the nonmanufacturing industries. Lists from the Department of Defense of the largest R&D contractors supplemented both sources. In 1967, the 1963 Census Enterprise Statistics file was the frame for multiunit manufacturing companies. Single unit manufacturers were sampled from the 1963 economic censuses. Social Security Administration (SSA) files represented the nonmanufacturing universe. Lists of R&D contractors supplemented the selected panel for DoD and NASA. After updating to the latest economic census, the same procedure was followed for the 1971–1975 panel, with the exception that the Census Enterprise Statistics file was used for selected nonmanufacturing industries. The SSA files represented the remaining nonmanufacturing industries.

In 1976, a change in frame sources occurred that holds to this day. The Census Bureau’s Standard Statistical Establishment List was used. This list, now known as the Business List, is updated annually and contains all nonfarm entities that the Census Bureau knows about. Although it was still not used for some of the nonmanufacturing companies, beginning in 1981 it was the prime source for all manufacturing and nonmanufacturing companies. It is still supplemented by lists from DoD and NASA and occasionally from other sources.

Classification of Industries

To prepare for sampling, each company must be assigned a classification code corresponding to its principal activity. The industry codes used are those in the Standard Industrial Classification prior to 1999 and in the North American Industrial Classification System (NAICS) since 1999. Because this is a company survey not an establishment

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

survey, all multiunit companies needed a code that represents their largest activity, usually measured by payroll.

In the beginning years of the survey, the classification code was that of the establishment having the largest number of employees. When the Census Bureau used the ASM as a frame, major activity for a company was based on value added, as measured in the ASM, later on product shipment, and then later on total employment. Nonmanufacturing industries were coded on the basis of BOASI industry codes.

Revisions of data have been part of this survey because company codes changed over the years as their principal activity changed, as reflected in the most recent economic censuses, because new editions of the SIC manual were issued, and because respondents altered reports for earlier reporting periods.

The current procedure for classification grew out of the method established in 1981. Payroll data were summarized for the establishments of a company to determine into which of the 10 major economic sectors the company fell. Within the largest sector, the largest 3-digit SIC code, also based on payroll, became the company code. This code determined the industry grouping of the company. Minor variations of this approach were used until a hierarchical approach, based on payroll, was developed in 1992.

In the hierarchical approach, establishment payroll data were summed to determine the largest of the 10 major economic sectors for each company. All nonprofit activity that could be identified in a company was eliminated. The largest 2-digit SIC code within this sector was determined and then the largest 3-digit SIC code within the 2-digit SIC code. The 3-digit SIC code became the company code. In the 2000 survey, the company was assigned a 4-digit NAICS code within the 3-digit subsector.

With the introduction of annual sampling in 1992, company recodes are subject to annual change. In 1993 Kusch and Ricciardi undertook a study at the Census Bureau to measure the extent of changes in company codes. That report shows that there is a high degree of consistency in coding (U.S. Census Bureau, 1994a). For the companies common to both the 1992 and 1993 sample frames, recode changes occurred for only 1.5 percent. These companies accounted for 3.7 percent of the payroll, indicating that changes were more likely to occur for larger companies.

Recodes 42 and 50 in 1992 had 11 percent and 7.4 percent, respectively, of the larger companies switch to another recode in 1993. These companies accounted for well over 50 percent of the total payroll for these recodes in 1992. In each of these recodes, a major company shifted its primary activity, as measured by payroll.

A large proportion (88.9 percent) of the recode changes were for single units. Since single units do not usually change operations, this suggested that the 1992 codes were changed in 1993 as a result of updating from the 1992 economic censuses.

Most of the changes took place in nonmanufacturing companies and mostly represented changes to other nonmanufacturing companies.

One of the benefits to an annual sample is that this kind of shifting of primary activity can be picked up when it happens and be reflected in the publication totals. Before annual sampling, when the recode was changed, revisions in data were made that proportionally allocated the change over the years of the sample panel.

Defining Sampling Strata
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

From its earliest days, the R&D survey has had a certainty stratum to include the companies thought to have the largest R&D expenditures. The very first survey defined the certainty stratum as all companies within scope having 1,000 or more employees. Over the years, the criteria changed. After 1994, the size criterion based on number of employees was dropped. In 1996, the criteria were total R&D expenditures of $5 million or more based on the previous year’s survey or on predetermined sampling error constraints relating to individual industry estimates.

In 2001, three broad strata were defined based on type of industry: manufacturing, nonmanufacturing, and unclassified. (The unclassified were those that had a missing or incomplete NAICS code at the time of sampling.) A total of 28 industry categories were designated for manufacturing firms, with 1 category for “small” manufacturing firms, and 18 industry categories for nonmanufacturing firms, with 1 category for “small” nonmanufacturing firms.

In 2002, there were two broad groups, based on what was known about the company’s R&D expenditures from the previous data of the last four years. The first group was the “knowns”; it included companies that had reported R&D expenditures at least once in the years 1998–2001: the second group was the “unknowns.” The reasons for this change were to improve the state estimates, to take advantage of historic data available in the Census Bureau’s processing program, and to improve the industry-level estimates.

The 48 industry categories were retained, but there were no “small” categories.

Within the known group, the certainty stratum was defined as all industries that had reported $3 million or more in R&D expenditures at least once from 1998 to 2001. The certainty stratum also included companies based on predetermined sampling error constraints relating to individual industry estimates. A second stratum within the known group was for firms with reported positive R&D expenditures less than $3 million. Firms with no R&D expenditures were a third stratum.

For the unknown category, a certainty stratum that included the largest 50 firms based on payroll, within each state, was established. Also included in the certainty stratum of the unknowns were companies based on predetermined sampling error constraints relating to individual industry estimates. The rest of the unknowns were in a separate stratum.

Procedures for Sampling

Starting in 1981, sample selection was based on probability proportionate to size (PPS). The measure of size used was the reported R&D expenditures for these companies in the frame that had reported a value in 1980. For all other companies, an estimate of R&D expenditures was developed and imputed. A relationship between R&D expenditures and total employment from the 1980 survey was used for each of the industry groupings for which data were published. The effect of the procedure was to give companies with large payrolls higher probabilities of selection, relying on the assumption that larger companies are more likely to do R&D.

The sum of reported and estimated R&D values represents a total universe measure of the previous year’s R&D expenditures. However, assigning R&D to every company resulted in an overstatement of the measure. The universe measure was scaled down by using factors developed from the relationship between the frame measure of the

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

prior year’s R&D and the final prior-year survey estimates. These factors were computed at levels corresponding to published industry levels. The scaled-down R&D values approximated the prior year’s published values. A Census Bureau document by W. Ricciardi describes the development of the measures of size and the scaling factors (U.S. Census Bureau, 1996).

In 1994, the frame for the noncertainty companies was partitioned into large and small companies. A disproportional number of small companies were selected into the sample and they had large weights. The disproportion was a result of using a minimum probability rule that increased the probabilities of selection for several hundred thousand small companies. Specifying a minimum probability controlled the maximum weight of a company.

The partition into large and small was based on total company payroll so that the largest companies representing 90 percent of total payroll for a given industry group was in the large frame. PPS sampling was used for that frame. Simple random sampling was used for the partition containing small companies. Beginning in 1996, total company employment was used to partition the frame. In the manufacturing sector, all companies with 50 or more employees were included in the large partition; in the nonmanufacturing sector, all companies with 15 or more employees were in the large partition.

However, in 1996, the number of strata for the small partition was reduced to two. One stratum was for small companies in manufacturing industries; the second was for small companies in nonmanufacturing industries. Consequently, estimates for industry groups within manufacturing and nonmanufacturing were not possible. Estimates from the small company partition were included in total manufacturing, total nonmanufacturing, and all industries.

Up to 1998, PPS sampling had been carried out using an independent sampling approach. The selection or nonselection of a company was independent of the sampling of any other company. Thus, the sample size could not be fixed, adding more sampling variability to the estimates. In 1998, a fixed sample size PPS method was used that ensures that the desired sample size was achieved. This was repeated in later surveys.

It has been difficult to achieve control over the sampling error in the survey estimates. Another component of sampling error depended on the correlation of the data from the sampling frame to assign R&D values (either reported or imputed) and the current-year reported data. The majority of firms in the universe had an imputed value. Assignment of sampling probabilities was based on this distribution. The presumption was that the actual variability from the sample design would be less than that estimated, because many of the sampled companies had true R&D values of zero. Generally this held, but exceptions occurred when companies with large sampling weights reported large amounts of R&D spending. Sampling errors for some industries, especially in nonmanufacturing, can be very large.

For the 2002 survey, within the stratum based on known R&D spending of some amount less than $3 million, the probability of selection was based on state and industry contributions and PPS sampling was used. For companies with no reported R&D expenditures, simple random sampling was used.

For companies with unknown R&D expenditures, after the certainty stratum of the largest 50 companies in each state, a probability of selection based on state and

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

industry was developed. Sampling error constraints were also developed. PPS sampling based on payroll within industry was used.

No longer was a measure of size imputed. For cases reporting in 1997–2001, the most recent nonzero reported R&D value was used. For those with zero reported, a minimum probability of selection was set at .01 for manufacturing codes and .004 for nonmanufacturing codes. For the unknown cases, the measure of size was the payroll from the frame.

Potential Sources of Error in Sampling

A constant concern in the survey has been the variability of estimates. Numerous procedures to reduce variance have been implemented over the years, even as the 2002 survey has emphasized a reduction in variances for state estimates. Although sampling errors are reported for survey variables, some of them are very high, particularly for the nonmanufacturing universe. Coupled with high imputation rates for some of these companies, the quality of the published data could be suspect.

The amount of variability in the data may be understated because of the high rate of imputation.

The current frame seems to need updating on a more regular basis. In a Census Bureau document, W.Davis and T.J.DeMaio give examples of a company receiving a form when another firm had purchased it (U.S. Census Bureau, 1995b). Addresses are sometimes incorrect. In a Census Bureau study of nonresponse in 1992 and 1994, companies often said that they had never received the form (U.S. Census Bureau, 1995c, 1997). This may not be true in all cases, but it is likely that some forms were never delivered.

The use of the SSEL to select smaller companies may lead to a coverage problem. Over the years, there has been a common impression that the SSEL has excellent coverage for large companies but that it may be somewhat deficient for smaller companies. For many statistics, the underrepresentation of small companies may not be important, but for R&D expenditures by type or by state, this loss could be important. The lists that the Bureau of Labor Statistics maintains are thought to be more complete for the smaller companies. Now with the ability of the two agencies to compare lists, the basic frame could be improved.

The inclusion of a prescreening question on the Company Organization Survey carried out by the Census Bureau should help identify companies that do no R&D. However, it would be useful to take a small sample from that group.

Data Collection

Basic Data Collection Procedure

The NSF staff along with the Census Bureau staff has met with respondents to determine the sources of data used by the respondents to complete the questionnaire, to examine how the respondents made estimates, and to identify problems encountered by the respondents. As a result of these “response analysis” studies, questionnaire items have been removed, items have been simplified, and additional instructions written for

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

some items. In the 1983 study, the most frequent criticisms related to definitions of basic, applied, and development R&D expenditures. The criticism still applies. Yet it is noteworthy that the questionnaire has changed in answer to respondents’ suggestions.

The current method of data collection is primarily by means of two forms: Form RD-1 is sent to known, large R&D performers, and Form RD-1A is sent to small R&D performers and companies in the sample for the first time. The items to be filled in are as follows:

  • Sales or receipts

  • Total employment

  • Employment of scientists and engineers

  • Expenditures for R&D performed within the company with federal funds and with company funds and other funds

  • Character of work (basic research, applied research, development)

  • Company-sponsored R&D expenditures in foreign countries

  • R&D performed under contract by others

  • Federally funded R&D by contracting agency

  • R&D costs by type of expense

  • Domestic R&D expenditures by state

  • Energy-related R&D expenditures

  • Foreign R&D expenditures by country

The RD-1A form collects the same information as the RD-1 form, except that it omits the last five items. The RD-1A form also has a screening item for respondents to indicate that they do no R&D.

In 2000 Form RD-1, along with instructions, was mailed to approximately 1,700 companies that reported R&D expenditures of $5 million or more in the 1999 survey; about 23,100 companies received Form RD-1A and its instructions. The instruction package provided definitions and item-by-item instructions. These instructions gave methods of estimating expenditures if the company did not keep records that gave exact allocations. For example, methods were given to estimate basic, applied, and development expenditures.

On the RD-1 form, companies were informed that they could report on a diskette rather than on paper. In addition, a web version of the form was made available for respondents to use.

Although the primary mode of data collection is by mail, data are also collected by telephone and over the web. Survey forms were mailed in March. Recipients of Form RD-1A were asked to respond in 30 days, while Form RD-1 recipients were asked to respond in 60 days. A follow-up form and letter were mailed to RD-1A recipients every 30 days if their completed survey form was not received. A total of five follow-up mailings could be sent to delinquent Form RD-1A recipients.

Telephone follow-up was used to encourage response from companies among the 300 largest R&D performers, based on total R&D expenditures in the previous survey. The R&D survey analysts at the Census Bureau made the telephone follow-up calls.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

If all attempts to get response failed and no current-year information were reported, data for domestic sales, total employment, total R&D, and the number of R&D scientists and engineers were imputed.

The Survey Respondents

The survey forms are mailed to the company. It is believed to be the case that there are multiple respondents for large companies. Some would be financial personnel, some scientists, and some human resources personnel. The Census Bureau does not mail to a specific person, although a person’s name may be available to them from a previous cycle. Company spokespersons say that it is often left to mailroom personnel at the company to decide to whom to deliver the form. Sometimes, it is thrown away.

A very important ongoing effort by NSF is to meet with respondents to discuss the questionnaire and respondent reactions to completing it. As a result of these efforts, some questions have been dropped and some instructions have been clarified.

The Census Bureau can field test up new questions with up to 500 respondents under the OMB general clearance. Test questions ask about availability of data, data quality, and time needed to respond. Supplements are included almost every year. The Census Bureau collects the responses and makes a recommendation on whether the data are collectable. NSF may accept the recommendation or it may decide to commission additional research.

In an effort to improve the questionnaire, some cognitive research was done at the Census Bureau by staff in the Center for Survey Methods Research. The report by Davis and DeMaio discusses “global” sources of error pertaining to the selection of a respondent within a company, how the structure of the company affects the data collected, and the perception of what constitutes R&D (U.S. Census Bureau, 1995b). Also, the authors note the questionnaire’s contributions to error as well as specific problems with the core set of questions. This report was issued in 1995, so some changes have undoubtedly taken place.

Two methods of study were used. Visits were made to selected companies. Of 40 selected, only 11 agreed to participate. Of the 11, all but one had been a survey participant. All were believed to have less than $1 million in R&D expenditures. A second method was a mail-out study, in which 25 companies with known R&D expenditures of $1 million or greater were selected. The latter group was mailed the 1995 version of the RD-1 form. Also, 75 companies believed to have spent less than $1 million were selected. Of these, 55 had previously participated and 20 were new to the survey. These firms were solicited by telephone before they were mailed the 1995 version of the RD-1 form. A series of cognitively oriented debriefing (COD) questions were added to the questionnaire for the groups mailed the form.

The authors concluded that finding the right contact person within a company and the methodology of surveying complex companies were two large contributors to error. For a complex company, the survey reports are to cover all locations of the company, all products, and all services. The easiest reporting is for a small-scale R&D laboratory whose main purpose is to perform research and development for other organizations in a specific field. The information requested is quite accessible to almost anyone in the company, with the possible exception of support staff. At the other extreme, companies with many subsidiaries or physically dispersed locations have a much more difficult task.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

Specifically, all R&D must be identifiable from records at headquarters by type of research. Some corporate-level records do not contain detailed information about subsidiaries. Project labels do not convey the kinds of activities involved in the project. Also, the subsidiary level may provide head counts or payroll counts but not information about scientists and engineers. The authors outline the strategies that a respondent might use to complete the form, some of which would result in an understatement of R&D expenditures, and some in item nonresponse.

The response process is also affected by the position of the contact person in the company. The research showed cases in which the survey contact was a corporate vice president, a legal staff member, a person from the financial department, or a scientist or engineer in a research department. Each of these types approached the response process with a different knowledge base, thus each produced different kinds of errors. The authors illustrate how these errors occurred. They also pointed out that the lack of a specific contact person on the address label is a large contributor to nonresponse, because the questionnaire may be thrown out immediately or simply passed from one employee to the next.

The recommendations of the authors to alleviate these problems are as follows (U.S. Census Bureau, 1995b):

  1. The data collection staff need to collect information about the complexity of the company, including whether companies have subsidiaries and whether the subsidiaries are domestic, or foreign, or both. A company profile should be built and updated on a regular basis.

  2. Effort must be made to determine the appropriate contact person for the company. The best respondent in a complex company is someone at the corporate level for total receipts and total employment for the company, but someone in the R&D area for the remaining items.

  3. Since at least two people are necessary to complete a questionnaire, the format of the report form should be redesigned to foster a “teamwork” approach.

  4. A form due date should be included in the questionnaire instead of a vague “30 days after receipt of form.”

The authors provide suggestions for improving both the graphic presentation of the instrument and the question wording. Since this study was done in 1995, presumably many of the graphic suggestions have been implemented. However, a recommendation that all survey items be put into question format has not yet been implemented.

Davis and DeMaio provide a wide range of suggestions for possible question wording changes (U.S. Census Bureau, 1995b). One that has arisen many times in discussions with respondents is in the definitions of basic, applied, and development activities. Their findings are as follows:

  1. R&D personnel correctly interpret these items, whereas financial and other non-R&D personnel were less successful. However, understanding of these terms was very poor across companies and respondent types.

  2. The definitions of the three terms are not included on the report form. They are in the instructions, which are frequently ignored.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
  1. On many occasions, it was obvious that the respondent omitted many R&D activities. One reason was that the respondent did not have enough knowledge of the company to know what to include. A second reason was the inability to interpret the definitions for R&D.

The second most problematic item on the report forms was the number of R&D scientists and engineers. Respondents must determine which scientists and engineers have a four-year degree or the equivalent in the physical and life scientists, mathematics, or engineering. Determining what is “equivalent” will vary depending on who is making the determination and whether or not they have access to personnel records. (Some respondents included all staff, including support staff and administrative personnel in the R&D area.)

Respondents must determine whether each of the people with a four-year degree or equivalent in an appropriate field worked at least some portion of the time in R&D, and then determine what proportion it was. Some respondents reported a head count of scientists and engineers; others made arbitrary decisions about the proportion of time, on average, that scientists and engineers worked on R&D. The concept of full-time equivalent was either not clear or it required more time and work than respondents were willing to spend.

There was a change in the reference date from prior-year data to January of current-year data for this item. Respondents frequently did not notice this or just used convenient end-of-year records. Thus, many different time periods are represented in the responses to this item, depending on what records were used and the fiscal years of the responding companies.

Davis and DeMaio recommend rewriting the question to emphasize that administrative and support staff positions are not equivalent to a college degree (U.S. Census Bureau, 1995b). They also recommended asking for a head count rather than full-time equivalents. Also, the placement of the question caused problems, because it necessitated several shifts in reference period. Finally, there seemed to be a context problem, since this question preceded item 6, cost for wages and salaries of all R&D personnel. The authors gave some suggestions for the rewording and placement of this item.

On April 25, 2003, two experts on questionnaire design, Nora Cate Schaeffer and Don Dillman, met with NSF staff and the NRC staff in connection with the work of the Committee on Research and Development Statistics at the National Science Foundation. Schaeffer suggested that the NSF and the Census Bureau identify cooperative companies and create a methods panel to support an ongoing structured testing program. Also, it was suggested that NSF and Census staff should resume a program of field observation to examine record-keeping practices and to conduct research on how respondents fill out the forms.

Schaeffer also suggested a study to examine the impact of printing prior-year data on the RD-1 questionnaire. The issue of whether or not to provide previous data has been debated for many years and many surveys, but there is little information on which to base a decision.

Nonresponse
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

As with all surveys, some sample units do not respond at all or omit some items. In 1999, the overall response rate was 83 percent; in 2000, it was 85 percent. This seems to be a decline from earlier levels of response of 88 percent in 1989 and 89 percent in 1988. However, over 90 percent of the 300 largest companies report. This is probably because of the telephone follow-up.

Response to four data items is mandatory: total R&D expenditures, federal R&D funds, net sales, and total employment. The remaining items are voluntary. Many companies have a policy not to report on voluntary surveys. Response to the voluntary items has been a serious problem. To determine whether the combination of mandatory and voluntary reporting influenced response, OMB requested that a test be done on reporting on a completely voluntary basis.

For the 1990 survey data, a voluntary panel was asked to report all data items on a voluntary basis. Companies in a mandatory panel were asked to report the mandatory four items and the other items as voluntary. The overall response rate for the 1990 survey was 80 percent, including 89 percent of the mandatory panel and 69 percent of the voluntary panel. At NSF, J.R.Gawalt concluded that the response rates for the mandatory panel were higher than for the voluntary panel for each of the mandatory items (National Science Foundation, 1991). However, the design of the test did not address the problem of response to voluntary items.

There are no item nonresponse rates given for any items. Instead, imputation rates are published, which can be very large. However, the notes to Table B-5 (National can’t find any dateScience Foundation, no date, b) on R&D in industry for 1999 indicate that these rates represent the percentage of the value in a given table cell in the Section A tables that had been imputed. It goes on to say that cells for which 50 percent or more of the data are imputed are flagged with an “S.” This means that the imputation rates are based on weighted data, so that any information on how many companies did not report a given item is lost.

With the imputation rates as a poor proxy for item nonresponse rates, one can see that even for items that are frequently reported in many federal surveys, net sales and total employment, the imputed ratios can be very high. They range from a low of 0.000 to 0.834 for net sales and from 0.000 to 0.895 for total employment. It seems strange that over 80 percent of the value for either net sales or total employment would have to be imputed, given that the frame for the survey is the SSEL.

The listing of a company as responding means that the Census Bureau heard back from the company, even if the company reported that it was out of scope, out of business, or had merged with another company. This is a way of accounting for the number of forms mailed out; it does not mean that any data items were completed on these forms. It seems unusual that the two basic items of net sales and total employment would be omitted, leading to concern about the amount of nonresponse for the R&D items.

Davis and DeMaio discuss the difficulties that companies have reporting the number of R&D scientists and engineers they employ (U.S. Census Bureau, 1995b). This difficulty is exemplified in the imputation rates for this item, 0.322 over all industries in 1999 and 0.375 in 2000. This means that about a third of the data shown in the detailed tables for this item are imputed. Although imputation rates for total R&D are relatively low, the rates for other items can be very high for all industries, for manufacturing, and for nonmanufacturing. In addition, the imputation rates, on the whole, increased from

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

1999 to 2000. Noteworthy of some comment is the absence of imputation rates for R&D costs by agency, which were very high in 1999 but were reported as 0.000 in 2000 for manufacturing and nonmanufacturing as a whole. The overall industry rates were still quite high.

Another Census Bureau analyst, D.Bond, also documents some of the nonsampling errors arising in this survey (U.S. Census Bureau, 1994b).

Potential Sources of Error in the Collection Procedure

At least three different data collection procedures are used—primarily mail, with some data collected by telephone and some over the web. Although the same questionnaire is used for all three modes, it may be that respondents report differently, depending on the mode. It is also likely to be the case that different kinds of processing take place and affect the data in different ways. It would be useful to tabulate data by mode, if a flag were available to note mode.

The mail-out version of the questionnaire has been put on the web. No research was conducted on designing a web-based questionnaire. However, this may be a good thing, since web-based respondents are not being subjected to different stimuli than the mail-out respondents. However, the current questionnaire does not meet many of the criteria for a web-based survey.

Since it is important to take into account the best knowledge available on questionnaire design, and since it is important to keep data provided through different modes to be the same, a revision of the questionnaire could improve accuracy. A new questionnaire would accommodate all modes of data collection, with particular emphasis on a web-based version. The current questionnaire contains many items that are difficult to answer correctly. In order to research ways of including new questions, format new questions, rephrase questions, format the questionnaire, and other aspects of questionnaire design, it would be useful to set up a methods test panel.

To help with the selection of the best respondent, frequently updated company profiles would detail the complexity of a company before a questionnaire is mailed. The current practice of mailing to a company, with no designation of who should respond, leads to nonresponse and incomplete data. The Davis and DeMaio report gave some indications of “best” respondent (U.S. Census Bureau, 1995b); further research and good company profiles could sharpen that knowledge.

A good definition of unit response would give both data providers and data users helpful information. As it stands, a unit is recorded as a respondent if a form is returned, even if the form has no data on it. This is done only to account for the number of forms mailed out. Perhaps this could be labeled “form returned” and a nonresponse category could be defined on the basis of a minimum number of items being reported.

Item nonresponse rates also need clear definitions and regular reporting in order to be useful to data users. If some of the difficulty caused in reporting these rates is that respondents leave blanks for zeros, that problem can be addressed in a forms redesign or in editing rules. In order to monitor the quality of survey items, the number of companies that do not report an item needs to be known. Imputation rates are not a substitute. This is not to say that weighted response rates are not useful; they are, but unweighted rates give additional information.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

Imputation rates have increased from 1999 to 2000. Since data are now available for two additional years, this anomaly can be looked at to see if it was just a one-year disturbance. If not, was there less reporting by companies, or was there a change in procedure?

Data Processing

In 1994, a series of memoranda by D.Bond discussing processing errors was issued (U.S. Census Bureau, 1994d). Most of the discussion in this paper relies on those memoranda.

Keying of Data

According to Bond, the Data Processing Division at the Census Bureau keys the data according to well-documented procedures (U.S. Census Bureau 1994d). Data entry staff verify by rekeying and matching a random sample of questionnaires from a previously keyed batch. For the 1992 survey, the specifications were to verify 20 percent of the forms, except for when there is 100 percent verification of a small batch of questionnaires. This happens when a keypuncher is new or it is early in the survey processing cycle. All errors found are corrected. If, within the 20 percent group, the error rate is less than a specified level (usually 1.4 to 2.8 percent) the errors are corrected and the batch is accepted. Otherwise the entire batch is verified and all errors are corrected.

Editing of Data

There is no written description of the editing process, including the process in which an analyst supplies data codes. Bond described his study of processing errors, in which he selected 67 RD-1 forms to review. He found three forms for which respondents had reported that they were out of business in 1992, but data for all three were incorrectly processed. This finding reinforces the finding of Davis and DeMaio that the frame may not be updated on a frequent basis (U.S. Census Bureau, 1995b). Bond went on to examine another random sample of 67 forms and found that one company went out of business in 1992, but the analyst had entered zero on the form for total payroll costs. In the database, this company had imputed positive values for 1992 R&D costs, scientists and engineers, and projected 1993 R&D costs. One respondent reported that the company was no longer doing R&D but had its number of scientists and engineers imputed as 93 for January 1993. Two respondents who reported total sales without the last 3 zeroes made it, unedited, into the database. As a result, there was one company in the database with $500,000 in sales and 3,000 employees, and another with $220,000 in sales and 1,322 employees. Bond reports other examples of errors and of flags being set as imputed where no imputation occurred. He found similar types of errors, although fewer, in reviewing Form RD-1A.

This study is now almost 10 years old, and many procedures in the survey have changed; perhaps the editing has improved. However, Bond made a recommendation that a processing error study be repeated every year. This has not been done, although it could prove useful.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Imputation

Imputation is an important part of the R&D survey, and imputation rates are high. Imputation is handled in different ways, depending on the item.

Bond described the imputation procedure (U.S. Census Bureau, 1994c). For domestic sales, total employment, total R&D, and number of research scientists and engineers that are missing current-year data are imputed by applying rates of change from the prior year, regardless of whether the prior-year data were imputed or reported. This is known as a cold deck procedure, since it is based on prior-year data. The underlying model is a linear regression through the origin, assuming that the variance of the residuals is proportional to the values in the prior-year data.

For basic research, subcontracted R&D, and foreign R&D, missing data are imputed only if the company reported the item in either of the prior two years. Otherwise, there is implicit imputation of zero.

If detail data do not sum to the total, for example federal R&D by agency, and if prior-year data are not imputed, then current-year data are distributed based on the previous distribution of the reporting unit. Otherwise, an industry average distribution is applied to the total to derive a value for each detail item. Rates of change are calculated by item within each NAICS category or industry. The calculations are based on weighted data for all companies that reported both variables. In the case of inter-item ratios (R&D to sales), calculations are based on data for all companies that reported both items in the current year. For current-to-prior-year ratios (employment), calculations are based on data for all companies that reported that item in both years.

The imputation program considers an item reported if the following codes are in the database:

A—analyst-entered value when there was no previously reported data

K—usually 1991 data reported on a 1992 form

L—late reported

R—on time reported

T—analyst-interactive correction, such as to correct an edit failure

F—analyst-originated value for total within-company R&D costs, based on information from a reliable outside source

Thus, the computer imputes data only for companies with the following codes in the item value:

Blank

B—no value reported

M—already computer imputed

Thus analyst imputations and corrections are treated as reported data.

Some items are imputed from the values of other items or by means of check boxes. The distribution of R&D costs over basic, applied, and development costs are not imputed. Also, if an imputation depends on an auxiliary reported value for another year or within the same report and that value is not available, no imputation is done. The result is that the item is processed as if a zero were present.

Bond analyzed the number of times that auxiliary data were not available to impute for missing values for R&D performers. Altogether, imputation could not

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

proceed for about 8 percent of the cases, but rates for certain items could be as high as 77 percent. Bond also looked at mean deviations and the root mean square deviation for determining whether or not the imputation procedure was biased. He found that current methods were not always biased up or down. Also, correlation coefficients between values in the numerator and denominator of imputation ratios were very high for most items, indicating that one variable was a good estimator of the other.

Outside sources reporting to the Securities and Exchange Commission (SEC) on companies as publicly owned can be used to match domestic sales, domestic employment, total or company-funded R&D, and, in some cases, federally funded R&D and then impute data. The Census Bureau’s SSEL can also be used for verifying and imputing domestic employment and domestic sales data.

Possible Sources of Error from Processing

The editing process seems to be rather informal, with little written down. It is clear that analysts have the opportunity to change the data, add information, and get data from other sources. Before the web version of the survey is designed, it would be useful to develop a formal editing procedure that could be built into the instrument.

When imputation rates are given, it would be useful to include all changes to the data, including those coming from analysts.

The editing procedure is letting several types of errors slip through. A general redesign of the form could also include a general redesign of the editing. It may be useful to have more editing online while the respondents are filling the form.

The imputation items are being imputed in cases in which they shouldn’t be. This may be more a function of the editing than the imputation.

Weighting and Estimation

Weighting

Weights were applied to each company record to produce national estimates. In 2000, within the PPS part of the sample, companies classified into the “other manufacturing companies” category were given weights up to a maximum of 75; in the remaining NAICS categories, maximum weights of 50 were assigned. Within the partition using simple random sampling, companies within the “small nonmanufacturing companies” category were assigned weights up to a maximum of 250: companies in the other NAICS categories were given maximum weights of 100.

In 2002, with the changes in the survey design for all industry groups and companies that had reported positive R&D at least once in the past four years and those firms with reported R&D less than $3 million, the maximum weight was 20. For firms that had reported positive R&D at least once in the last four years but no R&D in 2002, the maximum weight assigned was 100 for manufacturing companies and 250 for nonmanufacturing companies.

For those companies for which no information was available on prior R&D, the maximum weights were also 100 for manufacturing and 250 for nonmanufacturing.

Estimation
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

The estimator used for most national items is a stratified Horvitz-Thompson estimator. Since an entry is available for all cases of interest after imputation, there is no nonresponse adjustment.

A serious exception to this estimator has been in the production of state estimates. Prior to the 1999 survey, the Directory of American Research and Development published by the Data Base Publishing Group of the R.R.Bowker Company, was used together with previous survey results to estimate R&D expenditures by state for companies that did not provide this information. The information on scientists and engineers published in the directory was a proxy indicator for the proportion of R&D expenditures in each state. R&D expenditures in each state were estimated by applying the distribution of scientists and engineers by state from the directory to total R&D expenditures for these states. These estimates were included with reported survey data to arrive at published estimates of R&D expenditures for each state. Reports on the accuracy of the estimation were not available.

The directory was last published in 1997. No outside information to estimate R&D expenditures by state has been used since. The state estimates varied widely from year to year, primarily because the sample of companies in the states varied from year to year. A state might report on Form RD-1A that it had substantial amounts of R&D in a given year, and then the next year that company may not be in the sample.

Two strategies have been developed to cope with this situation. First, the top 50 companies in a state, as measured by payroll, are included in a certainty stratum. Thus, these larger firms remain in the sample from one year to the next. Second, a new composite estimator has been developed in which the first term is the unweighted contribution to R&D coming primarily from the certainty stratum cases. The second term is a ratio estimate of the contribution from the noncertainty companies. This type of estimator is used in small-area estimation and has some good properties, including that of having a smaller variance than either of the two terms separately. The estimator is as follows:

where

and ySi=reported R&D in state S if the ith company

yIi=reported R&D in industry I if the ith company

Wi=weight of the ith company

πi=probability of selecting the ith company=(1/wi)

XISi=payroll in industry I and state S of the ith company

XIi=payroll in industry I of the ith company

The (wi-1) in the second term eliminates all certainty companies. Similarly, the (1-πi) factor in the RIS eliminates certainty companies. The RIS [boldface ok?] factor provides the ratio of the payroll in the given company and state to the payroll of the given

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

industry over all states. For example, State A may have 2 percent of the payroll in a given industry. This ratio is then applied to the weighted (weights reduced by 1) R&D of the given industry summed over all companies in the industry. If the correlation between payroll and R&D is high, then the multiplication results in a weighted estimate of the state R&D in that industry. This is then summed over all industries.

Although the variance of the resulting estimator should be smaller than the variance of either of its terms, it is not clear how the variance is to be estimated. There were no details on how the variance estimates are made for the simple expansion estimates, either.

Possible Sources of Error in Weighting and Estimation

The weighting procedure and the variance estimation need to be described in more detail.

The variances are underestimates for several reasons. One is the way in which analyst edits are done, and the other is imputation. Neither source of putting data into the missing field is considered in the variance estimation. Also, there are cases in which items cannot be imputed and are treated as zero. This underestimates not only the variable but also its variance.

The new method of estimation makes use of research on small-area estimates. Many such estimators are available. It would be useful to document how this particular estimator was selected and what one can expect, both good and bad, from it. It would also be useful to watch the results of this estimator carefully to see how well it performs.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

References

National Science Foundation 1991 Research and Development in Industry: 1990. J.R.Gawalt, author. Washington, DC: National Science Foundation.

2001 Methodology Report for the National Science Foundation’s Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions, Fiscal Year 2000. Washington DC: National Science Foundation.

2002a Methodology Report for the NSF-NIH Survey of Scientific and Research Facilities, Fiscal Year 2001. Washington, DC: National Science Foundation.

2002b Methodology Report for the NSF Survey of Research and Development Expenditures at Universities and Colleges, Fiscal Year 2001. Washington, DC: National Science Foundation.

2002c Academic R&D Expenditures Survey Procedural and Editing Manual. Washington, DC: National Science Foundation.

no date a Methodology Report for the National Science Foundation’s Survey of Federal Funds for Research and Development, Vol. 50. Unpublished document, National Science Foundation, Washington, DC.

no date b Technical Notes from Research and Development in Industry: 1999, Section B. Unpublished document, National Science Foundation, Washington, DC.

no date c Technical Notes from Research and Development in Industry:2000, Section B. Unpublished document, National Science Foundation, Washington, DC.

no date Revised Imputation Methodology for Basic Academic Research Data, Memorandum by B.Shackelford. National Science Foundation


U.S. Census Bureau 1994a Comparison of Company Coding between 1992 and 1993 for the Survey of Industrial Research and Development (R&D). G.L.Kusch and W. Ricciardi, authors. Washington, DC: U.S. Census Bureau.

1994b Documentation of Nonsampling Error Issues in the Survey of Industrial Research and Development. Unpublished memorandum by D.Bond. U.S. Census Bureau, Washington, DC.

1994c An Evaluation of Imputation Methods for the Survey of Industrial Research and Development. Unpublished memorandum by D.Bond. ESMD Report Series ESMD-9404 U.S. Census Bureau, Washington, DC.

1994d A Survey of Processing Errors in the Survey of Industrial Research and Development. Unpublished memorandum by D.Bond. ESMD Report Series ESMD-9403 U.S. Census Bureau, Washington, DC.

1995a Design of the Survey of Industrial Research and Development: A Historical Perspective. Manufacturing and Construction Division Report Series, Working Paper No. Census/MCD/WP-95/01. G.L.Kusch and W. Ricciardi, authors. Washington, DC: U.S. Bureau of the Census.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×

1995b Results of Cognitive Research on the Survey of Industrial Research and Development. Unpublished paper by W.Davis and T.J.DeMaio. U.S. Census Bureau, Washington, DC.

1995c Survey of Nonrespondents in the 1992 Survey of Industrial Research and Development. Washington, DC: U.S. Census Bureau

1996 Comparison of Activity-Based R-Factors with Those Derived for the 1994 Survey of Industrial Research and Development (R&D). W.Ricciardi, author. Washington, DC: U.S. Census Bureau.

1997 Survey of Nonrespondents in the 1994 Survey of Industrial Research and Development. Washington, DC: U.S. Census Bureau.

Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 24
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 25
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 26
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 27
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 28
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 29
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 30
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 31
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 32
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 33
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 34
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 35
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 36
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 37
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 38
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 39
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 40
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 41
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 42
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 43
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 44
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 45
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 46
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 47
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 48
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 49
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 50
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 51
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 52
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 53
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 54
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 55
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 56
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 57
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 58
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 59
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 60
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 61
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 62
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 63
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 64
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 65
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 66
Suggested Citation:"Appendix A: A Quality Profile of the Federal Research and Development Surveys." National Research Council. 2004. Measuring Research and Development Expenditures in the U.S. Economy: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10963.
×
Page 67
Next: Appendix B: Biographical Sketches »
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!