APPENDIX A
A Quality Profile of the Federal Research and Development Surveys

Barbara A.Bailar

This quality profile discusses five of the federal research and development (R&D) surveys. Although there are other R&D surveys, many of them are not active. The five discussed are:

  1. Survey of Federal Funds for Research and Development

  2. Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions

  3. Survey of Research and Development Expenditures at Universities and Colleges

  4. Survey of Science and Engineering Research Facilities

  5. Survey of Industrial Research and Development

GENERAL BACKGROUND

The first two surveys are of federal agencies and use a frame constructed from information in the president’s budget as submitted to Congress and as included in budget documents of the Office of Management and Budget (OMB). The Survey of Research and Development Expenditures at Universities and Colleges is put together by comparing lists of institutions from different surveys, along with the results of the previous year’s survey, and many telephone calls. The Survey of Science and Engineering Research Facilities has two frames: the R&D expenditures survey and a list of biomedical institutions maintained by National Institutes of Health (NIH).

These four surveys have several links and have similar procedures. They all use ORC Macro as a contractor and a similar data collection and editing methodology. The industrial survey is much different, is much more complex, uses the Census Bureau’s Business list, formerly known as the Standard Statistical Establishment List (SSEL) maintained by the U.S. Census Bureau as its frame, and is conducted by the Census Bureau. The latter three surveys all trace the amount of funds spent on R&D, some of which come from federal agencies, while the first two trace the outlay of federal funds to industry and academia. There should be agreement between the amounts of reported by the federal agencies and the amounts reported from those who received the funds. The gap in these amounts is a cause for concern.

Similarities in Data Collection and Processing

Because ORC Macro conducts four of the five surveys, a common approach to data collection and processing has been taken. ORC Macro has developed a web-based system that provides:

  • A data collection system

  • A data monitoring system including receipt control

  • A data editing system



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report APPENDIX A A Quality Profile of the Federal Research and Development Surveys Barbara A.Bailar This quality profile discusses five of the federal research and development (R&D) surveys. Although there are other R&D surveys, many of them are not active. The five discussed are: Survey of Federal Funds for Research and Development Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions Survey of Research and Development Expenditures at Universities and Colleges Survey of Science and Engineering Research Facilities Survey of Industrial Research and Development GENERAL BACKGROUND The first two surveys are of federal agencies and use a frame constructed from information in the president’s budget as submitted to Congress and as included in budget documents of the Office of Management and Budget (OMB). The Survey of Research and Development Expenditures at Universities and Colleges is put together by comparing lists of institutions from different surveys, along with the results of the previous year’s survey, and many telephone calls. The Survey of Science and Engineering Research Facilities has two frames: the R&D expenditures survey and a list of biomedical institutions maintained by National Institutes of Health (NIH). These four surveys have several links and have similar procedures. They all use ORC Macro as a contractor and a similar data collection and editing methodology. The industrial survey is much different, is much more complex, uses the Census Bureau’s Business list, formerly known as the Standard Statistical Establishment List (SSEL) maintained by the U.S. Census Bureau as its frame, and is conducted by the Census Bureau. The latter three surveys all trace the amount of funds spent on R&D, some of which come from federal agencies, while the first two trace the outlay of federal funds to industry and academia. There should be agreement between the amounts of reported by the federal agencies and the amounts reported from those who received the funds. The gap in these amounts is a cause for concern. Similarities in Data Collection and Processing Because ORC Macro conducts four of the five surveys, a common approach to data collection and processing has been taken. ORC Macro has developed a web-based system that provides: A data collection system A data monitoring system including receipt control A data editing system

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Each of the four surveys uses this system, either exclusively or for the majority of its cases. In recent years the survey has been delivered to the respondents electronically, rather than in paper format through the mail. The respondent uses the electronic survey form, and ORC Macro follows up by e-mail or telephone. It keeps logs, generated by the system, of the progress of the agencies or institutions in reporting. The web-based system contains embedded edits, questioning the entering of erroneous data. Respondents are asked to check the data, repair arithmetical errors, and explain trend differences. In some of the surveys, data cannot be submitted to the National Science Foundation (NSF) until all error messages have been dealt with. Not all respondents submit data on the web-generated form, so that some of the surveys have to maintain a manual editing system. The web-based system is also used to tabulate data. There are many advantages to a web-based system. It should reduce the amount of error caused by carelessness or not having data from prior years to check against. It allows NSF to know which questions are giving respondents difficulty, since one can keep track of edit failures. It speeds up the processing. Also, there is complete knowledge of the status of every survey participant throughout the delivery, edit, and imputation stages. There are a few disadvantages to the system. In some cases, it forces respondents to provide data that they are unsure about. In this sense, it forces imputation by the respondent, rather than by NSF. When a statement is made in NSF publications that there is no item nonresponse, it conveys an impression of accuracy that may very well be misleading. Another disadvantage seems to be an inability to deal with weighted data, although that could probably be overcome. However, these four surveys have fewer than 1,000 respondents and no sampling, no nonresponse adjustment by weighting, and no other weighting is done. The industry survey is primarily a mail survey, although the forms have been put on the web. It would benefit this survey to have a well-researched form designed for the web, with the appropriate embedded edits. It would facilitate the calculation of response rates and give indications of which items were causing difficulty to respondents. It would also speed up the processing. Since this survey is many times larger than the other four, a more streamlined procedure would save considerable amounts of time and money. Some Concerns About the Surveys Before each of the surveys is reviewed separately in this appendix, I raise some concerns that are common when these five surveys are looked at as a package. One concern is that of providing prior-year data to respondents. On the industry survey, prior-year data are printed on the form. For the other surveys, prior-year data are available to respondents. Although many survey researchers believe that having prior-year data improves the quality of response by prompting respondents, others believe that it gives an easy way out. Research is needed on this issue. Cutoff amounts are often used to define the survey universe. For example, academic institutions with less than $150,000 in R&D expenditures in the previous year

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report may be excluded from the universe. However, the cutoffs are not the same from one survey to another, which may make comparisons difficult. There are differences in what types of universities or colleges are to be included and differences in fields of science to be included. Some surveys include the service academies, but not all. Some include the social sciences, but not all. The role of the data provider is handled very differently in the five surveys. In each survey, NSF is dependent on a person to provide the data. In many cases, no single person can do so alone; others must be involved. In the survey of Science and Engineering Research Facilities, a person designated as an institutional coordinator is recognized to be crucial to the success of the survey. It is the institutional coordinator who oversees the entire effort and decides what other people need to provide, what sources of data to use, who coordinates the entire effort, then reviews the data. At the other extreme, the Survey of Industrial Research and Development is mailed to a company, with no contact person designated and no overt recognition of the possible difficulty of responding. Since the role of the data provider is so important, research is needed to identify the best respondent and ways to recognize the role of the respondent. The surveys have very different approaches to nonresponse and imputation. Some try to eliminate it completely; others do some imputation. The Survey of Industrial Research and Development does not do a good job of reporting nonresponse, especially item nonresponse, but it handles imputation as a straightforward, statistical procedure that occurs in surveys. A thorough study of nonresponse reporting, both item and unit, and imputation would improve all the surveys and enhance comparability. Finally, the methodological reports from the different surveys vary in their completeness. Some give plenty of detail; in others, it is difficult to find out who the respondents are and what the frame is. An outline of what is to be included and methods for ensuring adherence would be very helpful to persons wanting to learn about the surveys and to compare them. SURVEY OF FEDERAL FUNDS FOR RESEARCH AND DEVELOPMENT Introduction Objectives The Survey of Federal Funds for Research and Development is the primary source of information about federal funding for R&D in the United States. Government, academia, and the science community use the results of the study. The survey is also used for budget purposes. Specifications The survey focuses on federal support of national scientific activities in terms of budget obligations and outlays. For each year, survey data are to be provided for three fiscal years: the fiscal year just passed, the current fiscal year, and the president’s budget year. Actual data are collected for the year just completed, while estimates are obtained for the current year and the budget year.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Agencies are asked to submit the same survey data as the budget figures submitted to OMB in January. The reporting unit for the survey is the subagency or agency to whom the survey materials are sent. The reporting forms are provided to agencies by a web-based data collection system developed by ORC Macro. The agencies are asked to send back the completed form by April 15. All processing and editing is done electronically. The survey is sponsored and funded by NSF and is carried out under contract by ORC Macro. Survey Design and Implementation Scope of Survey and Frame The scope of the survey has changed over the years, reflecting the number of subagencies that now fund R&D, the kinds of places in which R&D is conducted, and the kinds of questions agencies have the resources to answer. Federal obligations for research to universities and colleges by agency and detailed science and engineering field were added to the survey in 1973. Federally Funded Research and Development Centers (FFRDCs) are also included. The Central Intelligence Agency and other security-related agencies are not included. Not all agencies are asked to respond to all questions. Only the 10 largest R&D funding agencies are asked about the geographical distribution of obligations for R&D and R&D plant. R&D plant refers to the equipment and facilities where the R&D takes place. These 10 agencies account for about 97 percent of total R&D and R&D plant obligations each year. Only six agencies are asked to report on the distribution of R&D to universities by field of science and engineering and character of research. NSF held several workshops to learn from respondents about any difficulties in reporting. As a result, NSF considered removing certain items from the survey instrument. A flier was distributed notifying data users that NSF was considering eliminating several items from future publications. Data users were asked for comments. NSF removed 54 tables that showed data on two items: data for the special foreign currency program and detailed field of science and engineering data for estimated out years. NSF also eliminated two tables showing data on foreign performers by region, country, and agency, but these were later reinstated. The frame for the survey is the list of federal agencies that fund R&D, obtained from information in the president’s budget submitted to Congress. Agencies or subagencies that have reported R&D data in OMB budget documents are included. For volume 50, in which actual data were collected for FY 2000 and estimates were collected for FY 2001 and FY 2002, the frame consisted of 29 federal agencies and 73 subagencies. All of these are included, so there is no sampling. Potential Sources of Error in the Scope and Frame Since there is no sampling, there are no sources of error in a sampling operation. Coverage would be deficient if the frame did not include all the relevant federal agencies and subagencies that fund R&D activities. However, agencies are identified by using such sources as the president’s annual report, OMB budget documents, and respondent agencies. Coverage does not seem likely to be a problem.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Data Collection Basic Data Collection Procedure The first step in the collection process is to identify the agencies, subagencies, and respondents within the agencies in order to prepare and deliver the mail-out packets for the new survey cycle. ORC Macro keeps a master list of all respondents by agency and subagency. Changes to respondent information are obtained throughout the survey period either from NSF or from the respondents. The inclusion of agencies in the survey may change in response to changes in their R&D status. As changes are received, the master list of respondents is updated. NSF suggests names of additional agencies or subagencies that have reported R&D activities in OMB budget documents or in the media. The U.S. Government Manual available from the National Archives and Records Administration’s Web site at http://www.access.gpo.gov/nara/browse-gm-oo.html is used to verify agency names and to obtain phone numbers, if necessary. At least two weeks before the mail-out packets are to be sent, ORC Macro staff call each of the respondent agencies to verify names, addresses, and phone numbers. They also collect the name and address of the respondent’s supervisor. Since the survey is done entirely on the web, there is no actual mail-out of materials. ORC Macro sends a letter to respondents giving the URL and the agency’s respondent ID and password. It e-mails all materials to survey respondents. Follow-up procedures begin immediately after the e-mailing of the survey materials. ORC Macro calls or e-mails the respondents to ensure that they have received their packets. These contacts continue until ORC Macro is certain that all packets have been received by the respondents or are in the hands of the agency coordinator. The Web-based data collection system is used for all data collection, data imports, data editing, and trend checks,. The system consists of a data collection component, which allows survey respondents to enter their data online, and a monitoring component, which allows ORC Macro to monitor support requests, data entry, and data issues. The two components are password-protected, so that only authorized respondents and ORC Macro staff can access them. Agency respondents are given their respondent user IDs and passwords in their mail-out packets. Respondents are able to access the system to begin data entry around the first week of March. The Respondents The respondents to this survey are agency personnel assigned to complete the survey. Since the survey deals with budget obligations and outlays, the respondents tend to be primarily budget analysts. When an agency also has subagencies in the survey, the agency respondent is to review the data for all subagencies to ensure consistency. The Survey Instrument The survey instrument takes full advantage of being electronic. All definitions and help screens are readily available to the respondent.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report The key variables reported on are: Outlays for R&D and R&D plant Obligations for R&D and R&D plant Obligations for basic, applied, and total research, by field of science and engineering Obligations for basic research, applied research, and development, by performer Obligations to individual FFRDCs, FY 2000 only Obligations for R&D to foreign performers, FY 2000 only Obligations to R&D plant, by performer Obligations for R&D by state, FY 2000 only Obligations for R&D plant, by state, FY 2000 only Obligations for R&D to universities and colleges, by field of science and engineering Not all agencies are asked to complete all items. The first seven items are asked of all. The next two items are asked of 10 agencies, and the final item of 6 agencies. The Department of Defense (DoD) breaks out state obligations by research and development separately. The question to be answered appears at the top of the screen. Directions immediately follow the question, followed by definitions and then a table in which to report the data. Often there is a note that explains that certain amounts should total other amounts reported elsewhere. An attachment lists examples of disciplines included in the broad NSF fields of science and engineering. Unlike some other R&D surveys, the social sciences are included. Definitions of obligations and outlays are the same as those in the U.S. budget. Obligations represent the amount for orders placed, contracts awarded, services received, and similar transactions during a given period, regardless of when the funds were appropriated and when future payment of money is required. Outlays represent the amounts of checks issued and cash payments made during a given period, regardless of when the funds were appropriated or when the obligations were incurred. In reporting obligations and outlays, each agency includes the amounts transferred to other agencies for R&D support. The receiving agencies do not report funds transferred to them. Data are requested about the location, by state or outlying area, where the research takes place. If this information is not available, respondents are asked to assign the obligations to the state, outlying area, or office abroad where the headquarters of the U.S. primary contractor, grantee, or intramural agency is located. This could lead to some distortion of state estimates. Nonresponse The response rate for the survey has always been 100 percent. There is no known item nonresponse, since agency respondents must answer the questions before the data can be submitted.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Several steps are taken to achieve on-time response. ORC Macro keeps a folder for each agency with any information about the response or respondents. Specific steps that are taken are: Calling or e-mailing respondents immediately after mail-out to ensure that respondents have received their packets, reminding them of the due date, and encouraging them to call if there are any questions. Calling respondents again as the due date approaches. Asking respondents if provision of any data obtained during the previous cycle would be of assistance. Conducting issues workshops to answer questions about definitions and reporting practices. Providing the respondents with the name, address, and phone number of the agency respondent for the Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions, because many agencies cannot report until those data are available. Talking to or sending a letter under NSF signature to the respondent’s supervisor to obtain assistance in making the reporting of the data a higher priority. Asking NSF to intervene when a respondent continues to show reluctance to cooperate. Potential Sources of Error in the Collection Procedure Only limited measurement problems are discussed in the survey reports. Some agencies find it difficult to report certain items. Some agencies, such as DoD, do not include headquarters planning and administering of R&D programs in the full cost of R&D; these are not large costs, however. R&D plant data are also underreported because of difficulties encountered by some agencies, particularly DoD and the National Aeronautical and Space Administration (NASA), in identifying and reporting these data. Large deviations of reported obligations from year to year are questioned in the online editing. Agencies must supply explanations. Thus, if an agency is misreporting, it is easier to continue misreporting. However, there have been no studies of reporting, so no data are available. Data Processing Editing There is no need for keying or weighting of data, so the only processing step is the editing and the acceptance of the data by ORC Macro. The progress of each agency in completing the survey is available online and can easily be reviewed by ORC Macro or NSF staff. One key edit is a comparison with data reported in an earlier cycle. The initial report lists agency code, agency name, and number of differences for agencies that have reported a variance of $100 million in any data obligation category between the previous and current data collection cycles. The variance is then broken down by the table that is the source of the data variance, the specific data cell, a table code, and the current and

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report previous cycle obligation amounts. The agency must provide a narrative explanation, and this explanation is used to verify the difference. NSF reviews these cases. Detectable data errors are flagged automatically by the Web-based system. Some flags ask respondents to enter an explanation. The system also flags possible errors and asks respondents to review the material. Errors marked with a red sign will block submission to NSF. Until respondents fill all required fields, the agency cannot submit data. Data aggregated at the department level are checked to ensure accuracy, and a copy is delivered to NSF. ORC Macro developed a spreadsheet to be used as a guide for each person responsible for checking tables. It serves as a map to identify groups of tables to cross-check and to compare with similar totals generated from the database. The methodology report (National Science Foundation, Volume 50, fiscal years 2000–2002 contains a list of edit checks that are done online as the respondent completes the survey form. Agency coordinators review subagency data and can make changes. Once all the program offices in an agency have completed final submission, the agency coordinator aggregates the data into one record and submits it to NSF. Potential for Errors in Processing The extensive online editing is beneficial in speeding up the processing and ensuring that all items are completed. There is no evidence that any errors come into the process from the edits. SURVEY OF FEDERAL SCIENCE AND ENGINEERING SUPPORT TO UNIVERSITIES, COLLEGES, AND NONPROFIT INSTITUTIONS Introduction Objectives The Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions is congressionally mandated and is the only source of comprehensive data on federal science and engineering support to individual academic and nonprofit institutions. Federal policy makers, state and local government officials, university policy analysts, R&D managers, and nonprofit institution administrators use it. NSF and other federal agencies also use it for internal administrative purposes. Specifications The survey focuses on federal support to universities, colleges, and nonprofit institutions. Obligations to each institution are to be reported. The request is for the same information that was submitted to OMB in January of the survey year. A survey instrument is to be filled out for each university and college for which an agency obligated R&D funding during the previous fiscal year. Also, separate forms are to be completed for each nonprofit institution for which there were obligations.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report The obligations to universities and colleges are to be listed by type of science and engineering activity, while the nonprofit obligations are to be listed by R&D and R&D plant (R&D facilities and fixed equipment). The reporting forms are provided to agencies by a Web-based system. Respondents are asked to return the forms by April 15. All processing, editing, and tabulation are handled electronically. The survey is sponsored and funded by NSF and is carried out by ORC Macro. Survey Design Scope of Survey and Frame The data collection system for reporting federal obligations to universities and colleges was established in 1965. Since 1968, these data have been collected annually. Beginning with the FY 1993 annual report, NSF ceased publication of data collected for nonscience and engineering support to universities and colleges. DoD, since 1990, reports research obligations separately from development obligations. Beginning in 1994, development obligations are separated into two categories: advanced technology development and major systems development. Since 1990 NSF has not published data on detailed field of science and engineering obligations in R&D or fellowships, traineeships, and training grants to academic institutions. DoD reports all detailed science and engineering data in “not elsewhere classified.” Issues workshops held by NSF confirmed suspicions that the detailed science and engineering data reported by other agencies were not reliable. The U.S. service academies became respondents during FY 1998. NSF no longer collects data on FFRDCs. NSF now allows agencies to report deobligations to the survey, but no negative obligations are published. All negative obligations are changed to zero. The frame for the survey is the list of federal agencies that fund R&D, obtained from information in the president’s budget as submitted to Congress. The agencies that are the respondents to the Survey of Federal Funds for Research and Development are the respondents to this survey. For FY 2000, the target population was 18 federal agencies that incurred almost all of the obligations for federal academic R&D. The number of agencies included in the survey can vary from year to year. ORC Macro examines the list of agencies reporting to the federal funds survey for academic and nonprofit data. Then an agency will be asked to report such data on the Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions. However, in the next fiscal year, there may be nothing to report. Potential Sources of Error in the Scope and Frame Since there is no sampling, there is no error from a sampling operation. Coverage is considered excellent. However, NSF cautions users that not all federal agencies are surveyed, so that some funding could be missed. The omissions are believed to have little impact on total funding, but they could be significant for understanding the funding for some institutions.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report There may be some loss of coverage, particularly among nonprofit institutions, when the institutions do not get added to the list of “new” institutions in the codebook. However, any coverage loss by them is unknown. Data Collection Basic Data Collection Procedure Information is collected for the federal fiscal year (October 1 through September 30). Data collection starts in February with a due date in May. The first step in the collection process is to identify the agencies and respondents. Changes to respondent information are obtained throughout the survey cycle. The names of the agencies may change, or their inclusion in the survey may change as a result of whether or not they continue to provide science and engineering support to universities, colleges, or nonprofit institutions. ORC Macro keeps an updated list of agencies and respondents. At least two weeks before the start of the survey, ORC Macro staff members call each of the responding agencies to verify the name, address, phone number, fax number, and e-mail address. In addition, the name and address of the respondent’s supervisor is requested. After all respondents have been contacted, a respondent list is generated and a copy sent to NSF. At the beginning of each data cycle, respondents are asked to confirm or update the respondent information online. They cannot submit their data until they do so. Beginning with the 2001 survey, ORC Macro e-mailed all materials to survey respondents. ORC Macro calls or e-mails respondents to ensure that they have received their packets. Follow-up begins about two weeks after the respondents receive the packets if they have not yet logged on to the web-based system. The Web-based data collection system is used for all data collection, data imports, data editing, and trend checks. The system consists of a data collection component, which allows survey respondents to enter their data online, and a monitoring component, which allows ORC Macro to monitor support requests, data entry, and data issues. The two components are password-protected, so that only authorized respondents and ORC Macro staff can access them. The Respondents The respondents for this survey are personnel in the federal agencies. Most of them work in budget offices. The Survey Instrument The survey instrument takes full advantage of being electronic. All definitions and help screens are readily available to the respondent. The key variables are: Academic institution Geographic location (within the United States) Highest degree granted Historically black colleges and universities

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Obligations Performer (type of organization doing work) R&D plant Type of academic institution (historically black colleges and universities and others) Type of activity (e.g. R&D; science and engineering instructional facilities) Type of institutional control (public or private) Data are collected at the level of funding agency and provided in aggregate form for over 1,000 individual academic institutions in the following categories: Research and development Fellowships, traineeships, and training grants R&D plant and equipment Facilities and equipment for instruction in science and engineering General support of science and engineering Other activities related to science and engineering All other activities Data on over 1,000 nonprofit institutions are also available. The variables in this survey use definitions comparable to those used by OMB and the Survey of Federal Funds for Research and Development. Respondents are told that the totals reported in this survey and the federal funds survey for R&D and R&D plant obligations should be in close agreement. If differences exist, respondents should include an explanation. Totals could differ because methods differ for reporting funds that are transferred to another agency before being distributed to institutions. In this survey, the agency that distributes the funds directly to the institution is responsible for reporting those obligations. Thus, agencies included in this survey would report funds received from other agencies but would exclude funds transferred to another agency. For the federal funds survey, the obligations are reported by the original source of funds. Nonresponse There is no nonresponse from agencies. There is no known item nonresponse, since agencies must answer the questions before they can submit data to NSF. The survey instrument provides many definitions if agency respondents care to use them. Agencies may not be able to provide the exact information desired. For example, federal agencies may not be able to identify the branch of a university system to which funds go. Sometimes, there are problems in determining the correct institutional code to be assigned to a university. Such coding errors lead to errors in the estimates of funding for the institutions affected. No data are available on the extent to which federal agency respondents check their codes with universities. Other problems discussed in the methodology report (National Science Foundation, 2001) include agency difficulties in matching program descriptions to the proper funding category. NASA has placed increased emphasis on including educational

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Specifically, all R&D must be identifiable from records at headquarters by type of research. Some corporate-level records do not contain detailed information about subsidiaries. Project labels do not convey the kinds of activities involved in the project. Also, the subsidiary level may provide head counts or payroll counts but not information about scientists and engineers. The authors outline the strategies that a respondent might use to complete the form, some of which would result in an understatement of R&D expenditures, and some in item nonresponse. The response process is also affected by the position of the contact person in the company. The research showed cases in which the survey contact was a corporate vice president, a legal staff member, a person from the financial department, or a scientist or engineer in a research department. Each of these types approached the response process with a different knowledge base, thus each produced different kinds of errors. The authors illustrate how these errors occurred. They also pointed out that the lack of a specific contact person on the address label is a large contributor to nonresponse, because the questionnaire may be thrown out immediately or simply passed from one employee to the next. The recommendations of the authors to alleviate these problems are as follows (U.S. Census Bureau, 1995b): The data collection staff need to collect information about the complexity of the company, including whether companies have subsidiaries and whether the subsidiaries are domestic, or foreign, or both. A company profile should be built and updated on a regular basis. Effort must be made to determine the appropriate contact person for the company. The best respondent in a complex company is someone at the corporate level for total receipts and total employment for the company, but someone in the R&D area for the remaining items. Since at least two people are necessary to complete a questionnaire, the format of the report form should be redesigned to foster a “teamwork” approach. A form due date should be included in the questionnaire instead of a vague “30 days after receipt of form.” The authors provide suggestions for improving both the graphic presentation of the instrument and the question wording. Since this study was done in 1995, presumably many of the graphic suggestions have been implemented. However, a recommendation that all survey items be put into question format has not yet been implemented. Davis and DeMaio provide a wide range of suggestions for possible question wording changes (U.S. Census Bureau, 1995b). One that has arisen many times in discussions with respondents is in the definitions of basic, applied, and development activities. Their findings are as follows: R&D personnel correctly interpret these items, whereas financial and other non-R&D personnel were less successful. However, understanding of these terms was very poor across companies and respondent types. The definitions of the three terms are not included on the report form. They are in the instructions, which are frequently ignored.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report On many occasions, it was obvious that the respondent omitted many R&D activities. One reason was that the respondent did not have enough knowledge of the company to know what to include. A second reason was the inability to interpret the definitions for R&D. The second most problematic item on the report forms was the number of R&D scientists and engineers. Respondents must determine which scientists and engineers have a four-year degree or the equivalent in the physical and life scientists, mathematics, or engineering. Determining what is “equivalent” will vary depending on who is making the determination and whether or not they have access to personnel records. (Some respondents included all staff, including support staff and administrative personnel in the R&D area.) Respondents must determine whether each of the people with a four-year degree or equivalent in an appropriate field worked at least some portion of the time in R&D, and then determine what proportion it was. Some respondents reported a head count of scientists and engineers; others made arbitrary decisions about the proportion of time, on average, that scientists and engineers worked on R&D. The concept of full-time equivalent was either not clear or it required more time and work than respondents were willing to spend. There was a change in the reference date from prior-year data to January of current-year data for this item. Respondents frequently did not notice this or just used convenient end-of-year records. Thus, many different time periods are represented in the responses to this item, depending on what records were used and the fiscal years of the responding companies. Davis and DeMaio recommend rewriting the question to emphasize that administrative and support staff positions are not equivalent to a college degree (U.S. Census Bureau, 1995b). They also recommended asking for a head count rather than full-time equivalents. Also, the placement of the question caused problems, because it necessitated several shifts in reference period. Finally, there seemed to be a context problem, since this question preceded item 6, cost for wages and salaries of all R&D personnel. The authors gave some suggestions for the rewording and placement of this item. On April 25, 2003, two experts on questionnaire design, Nora Cate Schaeffer and Don Dillman, met with NSF staff and the NRC staff in connection with the work of the Committee on Research and Development Statistics at the National Science Foundation. Schaeffer suggested that the NSF and the Census Bureau identify cooperative companies and create a methods panel to support an ongoing structured testing program. Also, it was suggested that NSF and Census staff should resume a program of field observation to examine record-keeping practices and to conduct research on how respondents fill out the forms. Schaeffer also suggested a study to examine the impact of printing prior-year data on the RD-1 questionnaire. The issue of whether or not to provide previous data has been debated for many years and many surveys, but there is little information on which to base a decision. Nonresponse

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report As with all surveys, some sample units do not respond at all or omit some items. In 1999, the overall response rate was 83 percent; in 2000, it was 85 percent. This seems to be a decline from earlier levels of response of 88 percent in 1989 and 89 percent in 1988. However, over 90 percent of the 300 largest companies report. This is probably because of the telephone follow-up. Response to four data items is mandatory: total R&D expenditures, federal R&D funds, net sales, and total employment. The remaining items are voluntary. Many companies have a policy not to report on voluntary surveys. Response to the voluntary items has been a serious problem. To determine whether the combination of mandatory and voluntary reporting influenced response, OMB requested that a test be done on reporting on a completely voluntary basis. For the 1990 survey data, a voluntary panel was asked to report all data items on a voluntary basis. Companies in a mandatory panel were asked to report the mandatory four items and the other items as voluntary. The overall response rate for the 1990 survey was 80 percent, including 89 percent of the mandatory panel and 69 percent of the voluntary panel. At NSF, J.R.Gawalt concluded that the response rates for the mandatory panel were higher than for the voluntary panel for each of the mandatory items (National Science Foundation, 1991). However, the design of the test did not address the problem of response to voluntary items. There are no item nonresponse rates given for any items. Instead, imputation rates are published, which can be very large. However, the notes to Table B-5 (National can’t find any dateScience Foundation, no date, b) on R&D in industry for 1999 indicate that these rates represent the percentage of the value in a given table cell in the Section A tables that had been imputed. It goes on to say that cells for which 50 percent or more of the data are imputed are flagged with an “S.” This means that the imputation rates are based on weighted data, so that any information on how many companies did not report a given item is lost. With the imputation rates as a poor proxy for item nonresponse rates, one can see that even for items that are frequently reported in many federal surveys, net sales and total employment, the imputed ratios can be very high. They range from a low of 0.000 to 0.834 for net sales and from 0.000 to 0.895 for total employment. It seems strange that over 80 percent of the value for either net sales or total employment would have to be imputed, given that the frame for the survey is the SSEL. The listing of a company as responding means that the Census Bureau heard back from the company, even if the company reported that it was out of scope, out of business, or had merged with another company. This is a way of accounting for the number of forms mailed out; it does not mean that any data items were completed on these forms. It seems unusual that the two basic items of net sales and total employment would be omitted, leading to concern about the amount of nonresponse for the R&D items. Davis and DeMaio discuss the difficulties that companies have reporting the number of R&D scientists and engineers they employ (U.S. Census Bureau, 1995b). This difficulty is exemplified in the imputation rates for this item, 0.322 over all industries in 1999 and 0.375 in 2000. This means that about a third of the data shown in the detailed tables for this item are imputed. Although imputation rates for total R&D are relatively low, the rates for other items can be very high for all industries, for manufacturing, and for nonmanufacturing. In addition, the imputation rates, on the whole, increased from

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report 1999 to 2000. Noteworthy of some comment is the absence of imputation rates for R&D costs by agency, which were very high in 1999 but were reported as 0.000 in 2000 for manufacturing and nonmanufacturing as a whole. The overall industry rates were still quite high. Another Census Bureau analyst, D.Bond, also documents some of the nonsampling errors arising in this survey (U.S. Census Bureau, 1994b). Potential Sources of Error in the Collection Procedure At least three different data collection procedures are used—primarily mail, with some data collected by telephone and some over the web. Although the same questionnaire is used for all three modes, it may be that respondents report differently, depending on the mode. It is also likely to be the case that different kinds of processing take place and affect the data in different ways. It would be useful to tabulate data by mode, if a flag were available to note mode. The mail-out version of the questionnaire has been put on the web. No research was conducted on designing a web-based questionnaire. However, this may be a good thing, since web-based respondents are not being subjected to different stimuli than the mail-out respondents. However, the current questionnaire does not meet many of the criteria for a web-based survey. Since it is important to take into account the best knowledge available on questionnaire design, and since it is important to keep data provided through different modes to be the same, a revision of the questionnaire could improve accuracy. A new questionnaire would accommodate all modes of data collection, with particular emphasis on a web-based version. The current questionnaire contains many items that are difficult to answer correctly. In order to research ways of including new questions, format new questions, rephrase questions, format the questionnaire, and other aspects of questionnaire design, it would be useful to set up a methods test panel. To help with the selection of the best respondent, frequently updated company profiles would detail the complexity of a company before a questionnaire is mailed. The current practice of mailing to a company, with no designation of who should respond, leads to nonresponse and incomplete data. The Davis and DeMaio report gave some indications of “best” respondent (U.S. Census Bureau, 1995b); further research and good company profiles could sharpen that knowledge. A good definition of unit response would give both data providers and data users helpful information. As it stands, a unit is recorded as a respondent if a form is returned, even if the form has no data on it. This is done only to account for the number of forms mailed out. Perhaps this could be labeled “form returned” and a nonresponse category could be defined on the basis of a minimum number of items being reported. Item nonresponse rates also need clear definitions and regular reporting in order to be useful to data users. If some of the difficulty caused in reporting these rates is that respondents leave blanks for zeros, that problem can be addressed in a forms redesign or in editing rules. In order to monitor the quality of survey items, the number of companies that do not report an item needs to be known. Imputation rates are not a substitute. This is not to say that weighted response rates are not useful; they are, but unweighted rates give additional information.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Imputation rates have increased from 1999 to 2000. Since data are now available for two additional years, this anomaly can be looked at to see if it was just a one-year disturbance. If not, was there less reporting by companies, or was there a change in procedure? Data Processing In 1994, a series of memoranda by D.Bond discussing processing errors was issued (U.S. Census Bureau, 1994d). Most of the discussion in this paper relies on those memoranda. Keying of Data According to Bond, the Data Processing Division at the Census Bureau keys the data according to well-documented procedures (U.S. Census Bureau 1994d). Data entry staff verify by rekeying and matching a random sample of questionnaires from a previously keyed batch. For the 1992 survey, the specifications were to verify 20 percent of the forms, except for when there is 100 percent verification of a small batch of questionnaires. This happens when a keypuncher is new or it is early in the survey processing cycle. All errors found are corrected. If, within the 20 percent group, the error rate is less than a specified level (usually 1.4 to 2.8 percent) the errors are corrected and the batch is accepted. Otherwise the entire batch is verified and all errors are corrected. Editing of Data There is no written description of the editing process, including the process in which an analyst supplies data codes. Bond described his study of processing errors, in which he selected 67 RD-1 forms to review. He found three forms for which respondents had reported that they were out of business in 1992, but data for all three were incorrectly processed. This finding reinforces the finding of Davis and DeMaio that the frame may not be updated on a frequent basis (U.S. Census Bureau, 1995b). Bond went on to examine another random sample of 67 forms and found that one company went out of business in 1992, but the analyst had entered zero on the form for total payroll costs. In the database, this company had imputed positive values for 1992 R&D costs, scientists and engineers, and projected 1993 R&D costs. One respondent reported that the company was no longer doing R&D but had its number of scientists and engineers imputed as 93 for January 1993. Two respondents who reported total sales without the last 3 zeroes made it, unedited, into the database. As a result, there was one company in the database with $500,000 in sales and 3,000 employees, and another with $220,000 in sales and 1,322 employees. Bond reports other examples of errors and of flags being set as imputed where no imputation occurred. He found similar types of errors, although fewer, in reviewing Form RD-1A. This study is now almost 10 years old, and many procedures in the survey have changed; perhaps the editing has improved. However, Bond made a recommendation that a processing error study be repeated every year. This has not been done, although it could prove useful.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report Imputation Imputation is an important part of the R&D survey, and imputation rates are high. Imputation is handled in different ways, depending on the item. Bond described the imputation procedure (U.S. Census Bureau, 1994c). For domestic sales, total employment, total R&D, and number of research scientists and engineers that are missing current-year data are imputed by applying rates of change from the prior year, regardless of whether the prior-year data were imputed or reported. This is known as a cold deck procedure, since it is based on prior-year data. The underlying model is a linear regression through the origin, assuming that the variance of the residuals is proportional to the values in the prior-year data. For basic research, subcontracted R&D, and foreign R&D, missing data are imputed only if the company reported the item in either of the prior two years. Otherwise, there is implicit imputation of zero. If detail data do not sum to the total, for example federal R&D by agency, and if prior-year data are not imputed, then current-year data are distributed based on the previous distribution of the reporting unit. Otherwise, an industry average distribution is applied to the total to derive a value for each detail item. Rates of change are calculated by item within each NAICS category or industry. The calculations are based on weighted data for all companies that reported both variables. In the case of inter-item ratios (R&D to sales), calculations are based on data for all companies that reported both items in the current year. For current-to-prior-year ratios (employment), calculations are based on data for all companies that reported that item in both years. The imputation program considers an item reported if the following codes are in the database: A—analyst-entered value when there was no previously reported data K—usually 1991 data reported on a 1992 form L—late reported R—on time reported T—analyst-interactive correction, such as to correct an edit failure F—analyst-originated value for total within-company R&D costs, based on information from a reliable outside source Thus, the computer imputes data only for companies with the following codes in the item value: Blank B—no value reported M—already computer imputed Thus analyst imputations and corrections are treated as reported data. Some items are imputed from the values of other items or by means of check boxes. The distribution of R&D costs over basic, applied, and development costs are not imputed. Also, if an imputation depends on an auxiliary reported value for another year or within the same report and that value is not available, no imputation is done. The result is that the item is processed as if a zero were present. Bond analyzed the number of times that auxiliary data were not available to impute for missing values for R&D performers. Altogether, imputation could not

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report proceed for about 8 percent of the cases, but rates for certain items could be as high as 77 percent. Bond also looked at mean deviations and the root mean square deviation for determining whether or not the imputation procedure was biased. He found that current methods were not always biased up or down. Also, correlation coefficients between values in the numerator and denominator of imputation ratios were very high for most items, indicating that one variable was a good estimator of the other. Outside sources reporting to the Securities and Exchange Commission (SEC) on companies as publicly owned can be used to match domestic sales, domestic employment, total or company-funded R&D, and, in some cases, federally funded R&D and then impute data. The Census Bureau’s SSEL can also be used for verifying and imputing domestic employment and domestic sales data. Possible Sources of Error from Processing The editing process seems to be rather informal, with little written down. It is clear that analysts have the opportunity to change the data, add information, and get data from other sources. Before the web version of the survey is designed, it would be useful to develop a formal editing procedure that could be built into the instrument. When imputation rates are given, it would be useful to include all changes to the data, including those coming from analysts. The editing procedure is letting several types of errors slip through. A general redesign of the form could also include a general redesign of the editing. It may be useful to have more editing online while the respondents are filling the form. The imputation items are being imputed in cases in which they shouldn’t be. This may be more a function of the editing than the imputation. Weighting and Estimation Weighting Weights were applied to each company record to produce national estimates. In 2000, within the PPS part of the sample, companies classified into the “other manufacturing companies” category were given weights up to a maximum of 75; in the remaining NAICS categories, maximum weights of 50 were assigned. Within the partition using simple random sampling, companies within the “small nonmanufacturing companies” category were assigned weights up to a maximum of 250: companies in the other NAICS categories were given maximum weights of 100. In 2002, with the changes in the survey design for all industry groups and companies that had reported positive R&D at least once in the past four years and those firms with reported R&D less than $3 million, the maximum weight was 20. For firms that had reported positive R&D at least once in the last four years but no R&D in 2002, the maximum weight assigned was 100 for manufacturing companies and 250 for nonmanufacturing companies. For those companies for which no information was available on prior R&D, the maximum weights were also 100 for manufacturing and 250 for nonmanufacturing. Estimation

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report The estimator used for most national items is a stratified Horvitz-Thompson estimator. Since an entry is available for all cases of interest after imputation, there is no nonresponse adjustment. A serious exception to this estimator has been in the production of state estimates. Prior to the 1999 survey, the Directory of American Research and Development published by the Data Base Publishing Group of the R.R.Bowker Company, was used together with previous survey results to estimate R&D expenditures by state for companies that did not provide this information. The information on scientists and engineers published in the directory was a proxy indicator for the proportion of R&D expenditures in each state. R&D expenditures in each state were estimated by applying the distribution of scientists and engineers by state from the directory to total R&D expenditures for these states. These estimates were included with reported survey data to arrive at published estimates of R&D expenditures for each state. Reports on the accuracy of the estimation were not available. The directory was last published in 1997. No outside information to estimate R&D expenditures by state has been used since. The state estimates varied widely from year to year, primarily because the sample of companies in the states varied from year to year. A state might report on Form RD-1A that it had substantial amounts of R&D in a given year, and then the next year that company may not be in the sample. Two strategies have been developed to cope with this situation. First, the top 50 companies in a state, as measured by payroll, are included in a certainty stratum. Thus, these larger firms remain in the sample from one year to the next. Second, a new composite estimator has been developed in which the first term is the unweighted contribution to R&D coming primarily from the certainty stratum cases. The second term is a ratio estimate of the contribution from the noncertainty companies. This type of estimator is used in small-area estimation and has some good properties, including that of having a smaller variance than either of the two terms separately. The estimator is as follows: where and ySi=reported R&D in state S if the ith company yIi=reported R&D in industry I if the ith company Wi=weight of the ith company πi=probability of selecting the ith company=(1/wi) XISi=payroll in industry I and state S of the ith company XIi=payroll in industry I of the ith company The (wi-1) in the second term eliminates all certainty companies. Similarly, the (1-πi) factor in the RIS eliminates certainty companies. The RIS [boldface ok?] factor provides the ratio of the payroll in the given company and state to the payroll of the given

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report industry over all states. For example, State A may have 2 percent of the payroll in a given industry. This ratio is then applied to the weighted (weights reduced by 1) R&D of the given industry summed over all companies in the industry. If the correlation between payroll and R&D is high, then the multiplication results in a weighted estimate of the state R&D in that industry. This is then summed over all industries. Although the variance of the resulting estimator should be smaller than the variance of either of its terms, it is not clear how the variance is to be estimated. There were no details on how the variance estimates are made for the simple expansion estimates, either. Possible Sources of Error in Weighting and Estimation The weighting procedure and the variance estimation need to be described in more detail. The variances are underestimates for several reasons. One is the way in which analyst edits are done, and the other is imputation. Neither source of putting data into the missing field is considered in the variance estimation. Also, there are cases in which items cannot be imputed and are treated as zero. This underestimates not only the variable but also its variance. The new method of estimation makes use of research on small-area estimates. Many such estimators are available. It would be useful to document how this particular estimator was selected and what one can expect, both good and bad, from it. It would also be useful to watch the results of this estimator carefully to see how well it performs.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report References National Science Foundation 1991 Research and Development in Industry: 1990. J.R.Gawalt, author. Washington, DC: National Science Foundation. 2001 Methodology Report for the National Science Foundation’s Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions, Fiscal Year 2000. Washington DC: National Science Foundation. 2002a Methodology Report for the NSF-NIH Survey of Scientific and Research Facilities, Fiscal Year 2001. Washington, DC: National Science Foundation. 2002b Methodology Report for the NSF Survey of Research and Development Expenditures at Universities and Colleges, Fiscal Year 2001. Washington, DC: National Science Foundation. 2002c Academic R&D Expenditures Survey Procedural and Editing Manual. Washington, DC: National Science Foundation. no date a Methodology Report for the National Science Foundation’s Survey of Federal Funds for Research and Development, Vol. 50. Unpublished document, National Science Foundation, Washington, DC. no date b Technical Notes from Research and Development in Industry: 1999, Section B. Unpublished document, National Science Foundation, Washington, DC. no date c Technical Notes from Research and Development in Industry:2000, Section B. Unpublished document, National Science Foundation, Washington, DC. no date Revised Imputation Methodology for Basic Academic Research Data, Memorandum by B.Shackelford. National Science Foundation U.S. Census Bureau 1994a Comparison of Company Coding between 1992 and 1993 for the Survey of Industrial Research and Development (R&D). G.L.Kusch and W. Ricciardi, authors. Washington, DC: U.S. Census Bureau. 1994b Documentation of Nonsampling Error Issues in the Survey of Industrial Research and Development. Unpublished memorandum by D.Bond. U.S. Census Bureau, Washington, DC. 1994c An Evaluation of Imputation Methods for the Survey of Industrial Research and Development. Unpublished memorandum by D.Bond. ESMD Report Series ESMD-9404 U.S. Census Bureau, Washington, DC. 1994d A Survey of Processing Errors in the Survey of Industrial Research and Development. Unpublished memorandum by D.Bond. ESMD Report Series ESMD-9403 U.S. Census Bureau, Washington, DC. 1995a Design of the Survey of Industrial Research and Development: A Historical Perspective. Manufacturing and Construction Division Report Series, Working Paper No. Census/MCD/WP-95/01. G.L.Kusch and W. Ricciardi, authors. Washington, DC: U.S. Bureau of the Census.

OCR for page 24
Measuring Research and Development Expenditures in the U.S. Economy: Interim Report 1995b Results of Cognitive Research on the Survey of Industrial Research and Development. Unpublished paper by W.Davis and T.J.DeMaio. U.S. Census Bureau, Washington, DC. 1995c Survey of Nonrespondents in the 1992 Survey of Industrial Research and Development. Washington, DC: U.S. Census Bureau 1996 Comparison of Activity-Based R-Factors with Those Derived for the 1994 Survey of Industrial Research and Development (R&D). W.Ricciardi, author. Washington, DC: U.S. Census Bureau. 1997 Survey of Nonrespondents in the 1994 Survey of Industrial Research and Development. Washington, DC: U.S. Census Bureau.