Measuring R&D Activity in Academic Institutions
The third component of the nation’s information on R&D expendi tures is the data on expenditures in colleges and universities, broadly defined to include Federally Funded Research and Development Centers (FFRDCs). Together with funding information provided by governments and spending information provided by business and industry, the academic surveys contribute to the compilation of the extent and character of R&D spending.
SURVEY OF RESEARCH AND DEVELOPMENT EXPENDITURES AT UNIVERSITIES AND COLLEGES
The Survey of Research and Development Expenditures at Universities and Colleges serves as the primary source of information on separately budgeted R&D expenditures in academia in the United States. The survey encompasses a large part of the R&D enterprise and a very important sector: universities account for about 13 percent of total R&D performed in the United States, and FFRDCs add another 6 percent of total spending.
Importantly, the survey covers a sector with many unique characteristics that both define areas of interest and delimit the nature of the data collected. For example, most of the R&D spending by colleges and universities comes from outside sources, primarily governments and industry, so it is important to follow the funding trails in order to understand the nature of the R&D activity in this sector. Likewise, a good part of R&D is associated with the educational mission of the institutions, so it is important to consider funding in light of educational offerings and their outputs—usu-
ally expressed in the form of degrees granted but often in works cited or other indicators of academic output.
It is generally acknowledged that the survey of research and development spending at colleges and universities is a successful data collection program. NSF points with pride to the fact that the response rate from academia has historically been about 95 percent, and it is usually 100 percent for the FFRDCs (see Table 6-1). However, the high unit response rate is tempered by bothersome high item nonresponse rates for some data items. Both the small unit nonresponse and the higher item nonresponse are masked by the fact that NSF imputes values when there is no response. The validity of the imputation methods is open to question.
The survey has impressive depth of coverage: it covers all institutions with doctoral programs in science and engineering (S&E) fields or at least $150,000 in separately budgeted R&D activity, meaning that it is a virtual census of all research and development spending at colleges and universities. These institutions have traditionally expended more than 95 percent of U.S. academic R&D funds. In addition, the survey population includes all FFRDCs that are academically administered and engaged in basic or applied research, development, or management of R&D activities. Also, all historically black colleges and universities (HBCUs) that perform any separately budgeted R&D in S&E are included.
The development of the frame for the survey of colleges and universities is a very complicated undertaking, with the possibility of generating coverage error (see Box 6-1). The application of the dollar limit to identify universities and colleges that conduct R&D could be a problem in that, if the frame fails to identify an academic institution with at least $150,000 in separately budgeted R&D, there would be a gap. The HBCUs are well identified so do not pose any coverage loss. Similarly, FFRDCs are well known. The doctorate-granting universities are well known. So it is the application of the dollar limit that could pose problems. NSF describes the only other gap as the four 2-year degree-granting institutions that accounted for less than 0.01 of the total R&D expenditures. The issues raised with this frame also affect the Survey of Science and Engineering Research Facilities, which uses this frame as its universe.
While there are other sources of information on the extent and direction of R&D activity at colleges and universities, including the NSF sister surveys of science and engineering graduate students and postdoctorate students, there is no survey that provides such a long-term (annually since 1972) view of science and engineering activity at the nation’s academic institutions. The historical continuity of the data is often considered to be a particular strength of this data collection, in view of the fact that the results of this survey are often blended with the results of the federal funds and
TABLE 6-1 Changes in Sampling and Coverage of the Academic Survey
industry surveys into a time series to portray the totality of and trends in R&D activity. However, although the survey is often presented as a continuous time series since 1972, when the survey became an annual collection after the initial four biennial collections, in reality the series has been subject to several perturbations (see Box 6-2).
In fiscal year (FY) 1978, a different population was sampled and different questions were asked, so data are not comparable for that year.
Survey design has also shifted over time:
After 1993, the sample was truncated to include all doctorate-granting institutions but only a partial sample of nondoctorate-granting institutions.
Significantly, current year and recent years’ data are only provisional. As described by NSF, when the review of data for prior years reveals discrepancies, the prior years’ data are sometimes modified.
Prior to 1997, there was potential double-counting of R&D spending because funds that were passed through the universities to subrecipients were not broken out from the funds that were actually spent at the institution. This problem was addressed in years subsequent to 1998 when questions were added to the survey to identify “passed through R&D expenditures to subrecipients.” Now, a new problem is posed for the survey, in that the level of the series adjusted for double-counting is higher than the level of the series without double-counting.
The frame for the survey is developed from a variety of sources. Not all academic institutions are included, so a procedure has been developed for the NSF-National Institutes of Health Survey of Graduate Students and Postdoctorates in Science and Engineering (graduate student survey) and the NSF Survey of zFederal Science and Engineering Support to Universities, Colleges and Nonprofit Institutions populations. Institutions with a highest degree-granting status of 1 (doctorate-granting) are compared to make sure all S&E doctoral degree-granting institutions are included in the academic R&D expenditures survey population. This task is done annually. Sources consulted are the Higher Education Directory and direct contact with universities. NSF maintains a list of all FFRDCs. The Department of Education maintains a list of all HBCUs.
Each S&E doctoral degree-granting institution, FFRDC, and HBCU that reported zero expenditures in the previous fiscal year is contacted to find out if it has expenditures data to report for the current year. If there are expenditures to report, a survey packet is sent. If the institution says it has zero expenditures, a zero is entered for that institution. If the institution says zero but the federal R&D obligations data suggest otherwise, this discrepancy is discussed with the institution, and an attempt is made to obtain actual expenditures.
S&E master’s and bachelor’s degree-granting institutions that reported $150,000 or more in S&E R&D expenditures in the previous fiscal year are surveyed again. To determine which other institutions should be included in the current FY survey, ORC Macro creates a list of all other S&E master’s and bachelors’ degree-granting institutions in the academic R&D expenditures survey universe that received federal obligations during any cycle from the previous 5 years. Institutions are included in the current cycle if they either: reported cumulative expenditures for the last three full population surveys of $250,000 or more or were reported by federal agencies to have cumulative obligations of more than $400,000 for the fiscal year plus the 4 previous years.
A total of 36 academically administered FFRDCs were included; 4 FFRDCs administered by industrial organizations and 16 administered by nonprofit organizations were surveyed but not included in publications.
In contrast to some of the other NSF data collections, there is evidence that the data are extensively used by the respondent institutions that provide the data. The data are published at the institutional level of detail, a feature of this survey that is fairly unusual among government surveys, in which the confidentiality of responses is closely protected. As a result, universities and colleges are able to use the published data from the survey to benchmark their programs against other institutions. They are also able to assess the mix and direction of their programs in the context of the
general population because of the rich variety of indicators collected and published. There has been an attempt to update the data items collected to accommodate analysis of trends, primarily through refinement of definitions, inclusions, and collection of optional data items (see Box 6-3).
Together with the NSF-NIH Survey of Graduate Students and Post-doctorates in Science and Engineering and the Survey of Science and Engineering Research Facilities, the survey gives a picture of scope and depth of an institution’s science and engineering program. In order to accomplish this, the survey collects the following data items on a recurring basis:
Character of work (basic research, applied research, or development)
Expenditures for S&E R&D
Federally funded R&D centers
Field of S&E
Geographic location (within the United States)
Highest degree granted
Source of funds (federal, state or local, industry, institutional, or other)
Type of academic institution (doctorate-granting or other)
These concepts and definitions are generally consistent in definition with the data collected from the federal government and industry, so they are capable of being combined with the results of the other surveys to get an aggregated view of the size, scope, and direction of R&D in the U.S. economy.
The interest in trends in college and university spending by fields of science arises largely in response to the need to trace the shifting emphasis in public funding for R&D by field. The most significant trend in funding by fields of science has been the reductions in funding for most fields of engineering and physical sciences, on one hand, and the accelerating growth in funding for biomedical research, on the other (National Research Council, 2001c). The recent report of the President’s Council to Advance Science and Technology focused on the rather large changes in disciplinary areas supported by R&D funding over the past 25 years (President’s Council of Advisors on Science and Technology, 2002). The report pointed out the rather large shifts in funding among science and engineering disciplines, such as in the physical and life sciences. As a base point: in FY 1970, support for the three major areas of research—physical and environmental sciences, life sciences, and engineering—was equally balanced. Today, the life sciences receive 48 percent of federal R&D funding compared with 11 percent for the physical sciences and 15 percent for engineering.
In order to assess these trends over time, it is necessary to get the data right; in order to get the data right, it is necessary to get the taxonomy of disciplinary classifications right and to apply that taxonomy consistently over time. This is not a simple matter. The taxonomy of fields of science has enlarged its purpose over the years, much as have all classification systems. In this case, a system that was designed to assist academia in sorting out doctoral programs in order to obtain reputational rankings of similar programs and to independently describe offerings has been pressed into service to support data collection in government and ancillary organizations. For the latter purpose the taxonomy is not so directly useful, so mechanisms for tracing activity by the classification system have not naturally evolved (Kuh, 2003). This evolution of the field-of-science taxonomy has also been the case with classifications of industry and occupation, which likewise have been pressed into uses for which they were not designed.
Like other classification systems, the classification by fields of science has its official sanction in a directive issued by the U.S. Office of Management and Budget (OMB). The most recent codification that delineates the “official” list of fields of science is contained in a directive entitled Standard Classification of Fields of Science and Engineering. This directive was last
issued in 1978. In actuality, the standing list of fields has been around even longer, since the 1978 directive superseded without change a directive that had been published as OMB Circular No. A-46, dated May 3, 1974. It is uncertain how long that circular had been in gestation. In any event, the federal government is relying on a view of science and engineering that pertained some three decades ago.
In the ensuing years, much has changed in the science and engineering enterprise, as well as in the manner in which educational institutions organize their faculty and curriculum. Some of these changes were pointed out in a presentation at a workshop held by the Board on Science, Technology, and Engineering Policy by Charlotte Kuh of the National Research Council (NRC). In the dynamic biological sciences, for example, molecular biologists have been joined by biophysicists and structural biologists; genetics has been subdivided into molecular and general genetics; neurosciences now includes neurobiology; pharmacology now is pharmacology and toxicology; and so on. As science advances, new fields with new names spring up. Genomics morphs into proteomics, which may be regrouping into structural biology. These trends can only be identified in a system that permits field-specific variation and fineness of disaggregation, both of which tend to invalidate the cross-field and cross-data source comparisons to which the taxonomy has been addressed in recent years.
These organizational realities and the static nature of the taxonomy, it was reported in a previous National Research Council study, obscure some critically important information about the character of the science and engineering enterprise at colleges and universities:
The diversity of some research fields, such as physics, which encompasses nuclear, particle, and solid state physics among the many subdisciplines.
The growing importance of interdisciplinary and multidisciplinary research (discussed below).
The changing emphasis within the fields, such as the shift to biologically based chemistry relative to physical chemistry within this field.
The integration of some related fields, an example being the integration of electrical engineering, computer science, molecular biology, and biochemistry.
The emergence of new fields and subfields, such as materials science, computational biology, and biophysics, and the near disappearance of others (National Research Council, 2001c).
There has been criticism over the years that the taxonomies are not standard from survey to survey. It is critically important to have comparable classifications among surveys so that the data can be compared and
linked. Yet, as an example, the life sciences in the survey of college and university spending are disaggregated into agricultural, biological, medical, and other life sciences, while in the survey of federal support to universities, colleges, and nonprofit institutions, the life sciences disaggregation includes a fifth field—environmental biology (National Research Council, 2000).
The panel recommends that it is now time for the U.S. Office of Management and Budget to initiate a review of the Classification of Fields of Science and Engineering, last published as Directive 16 in 1978. The panel suggests that OMB appoint the Science Resources Statistics office of the NSF to serve as the lead agency for an effort that must be conducted on a government-wide basis, since the field classifications impinge on the programs of many government agencies. The fields of science should be revised after this review in a process that is mindful of the need to maintain continuity of key data series to the extent possible (Recommendation 6.1).
Non-S&E R&D Data
Many researchers in the fields of education, law, humanities, business, music, the arts, library science, and physical education would be surprised to find that the U.S. government does not consider their work to be part of science and engineering, particularly when so many of these fields have adopted investigative techniques and standards of evidence that replicate those of the “official sciences.” Yet, by definition, these fields are excluded from the science and engineering totals.
There appears to be a widespread demand for collection and publication of data for non-S&E categories. This demand emanates from several sources. Survey respondents at institutions with a heavy concentration of research in the social sciences, education, humanities, the arts, and business were concerned that the survey did not give a complete and accurate picture of their research effort. In some schools, the data furnished to NSF did not jibe with the data in financial statements, which included non-S&E categories. The schools indicated that this would not be a large new burden, since many already collected the data for internal purposes and actually had to delete the non-S&E disciplines for the survey. NSF had its own reasons for initiating collection of these data; most countries followed the recommendation in the Frascati Manual to include social sciences (including educational sciences) and the humanities in the higher education R&D expenditure surveys. By excluding these data, the U.S. data were increasingly noncomparable to data from other countries.
To its credit, NSF has collected separate data on nonscience R&D spending since FY 1997. These data were collected on an optional basis, estimated, and used for internal purposes.
In recognition of the widespread demand for these data, NSF sought
approval from OMB to collect these data on the surveys from FY 2003 through FY 2005, and with that approval, added these data to the regular questioning beginning with the collection of data for FY 2003. NSF plans to publish these estimates on a continuing basis.
A major trend in the conduct of research and development in recent years has been an increase in the interdisciplinary nature of the enterprise. Collaborations between the fields of science, particularly between engineering and the other sciences, have characterized many innovations in R&D.
The current National Nanotechnology Initiative is an example. This multiagency project is pushing the frontier in nanoscale science and engineering that may have implications for medical diagnosis and treatment, efficient manufacturing processes, energy conversion and storage, and electronic devices. The applications in nanotechnology draw on advances in multiple disciplines, such as chemistry, physics, biology, and materials. The Office of Science and Technology Policy, which is coordinating this effort, points out that the initiative is blurring the distinctions of traditional scientific domains and creating a new culture of interdisciplinary science and engineering (National Science and Technology Council, 2003).
Thus, this is an example of a major initiative in which interdisciplinary research is being actively promoted by the federal government. Yet the NSF survey of R&D expenditures at colleges and universities not only fails to facilitate the collection of information to measure interdisciplinary research, but also, in fact, actively discourages respondents from reporting interdisciplinary research. The survey directs reporters to categorize interdisciplinary projects individually according to the nature of the research performance and to prorate expenditures to report the proportion of each discipline involved when multiple fields of S&E are encountered. It is even not possible for reporters to be creative and report interdisciplinary research as a subset of “other” fields, since respondents are directed not to use this category to report interdisciplinary or multidisciplinary research unless they cannot identify a primary field.1
The justification for and the basis of collecting information on the character and extent of interdisciplinary research was laid out by a National Research Council committee some years ago (National Research Council, 1987). In the realm of science education, the report defined interdisciplinary in terms of types of collaborations:
Collaboration between different specialties of a discipline—for example, among cognitive, developmental, and social psychologists to understand the role of early childhood experiences in the comprehension of scientific concepts.
Collaboration among disciplines—for example, among chemists, educational psychologists, and cognitive scientists to structure chemical knowledge for effective instruction.
Collaboration among basic research, applied research, and development and application—for example, to improve the effectiveness of computer technology in the teaching of basic arithmetic operations.
Collaboration among practitioners, policy makers, and researchers—for example, to develop and adopt curricula and teaching materials that communicate scientific knowledge in a way that is both scientifically rigorous and educationally manageable.
The first two of these types of collaboration define interdisciplinary research of interest to the panel. The panel recommends that NSF engage in a program of outreach to the disciplines to begin to develop a standard concept of interdisciplinary and multidisciplinary research and, on an experimental basis, initiate a program to collect this information from a subset of academic and research institutions (Recommendation 6.2). A forthcoming NRC report on facilitating interdisciplinary research will also provide guidance on this subject (National Research Council, 2005).
University-related research parks have been identified as an important mechanism for the transfer of academic research findings, as a source of knowledge spillovers, and as a catalyst for national and regional economic growth.
In a November 2002 workshop convened to discuss policy needs for indicators related to the formation, activities, and economic consequences of research parks, the participants recommended that NSF learn more about U.S. research parks, their history and evolved relationship with university research, the roles of parks in the innovation system, the definition of park success, and the conditions for its achievement (Link, 2003). In his 2003 study for NSF, Albert Link inventoried and traced the development of 80 currently active university-related research parks across the country. These parks are considered to have contributed significantly to the advancement of science and engineering and the generation of innovation, and there is a strong movement to invest in these expensive assets by many governments and institutions. Yet the place of the research parks in the U.S. innovation system is not clearly understood.
Collaboration with Industry
The growing propensity of universities to enter into collaborative R&D arrangements with business and government laboratories has been a major trend in the R&D environment over the past two decades. There is evidence that universities are increasing funding links, technology transfer, and collaborations with industry and government. It has been postulated that these arrangements are expanding, in part, because of the decreasing role of the federal government as a funding source for R&D, encouraging universities to rely increasingly on nonfederal funding sources (Jankowski, 1999).
As discussed in Chapter 2, these collaborations have also been aided and abetted by government policies that encourage joint research activities, technology codevelopment, contract research, and technology exchange through licensing and cross-licensing arrangements (Vonortas, 2003). Some of the arrangements are underscored by equity arrangements and could be identified in equity transaction disclosures (such as reports to the Securities and Exchange Commission), but most are based on nonequity agreements.
It is critical that these arrangements be identified and understood in order to come to a complete understanding of the nature and extent of R&D activity in the economy. However, due to their nature, many slip under the radar screen of the dedicated databases that are designed to track the formal collaborations that have the blessing of law. The registry of research joint ventures under the National Cooperative Research and Production Act identifies only 9 percent of the 3,000 U.S.-based collaborations over the period 1985-2000 involving nonprofit institutions, including universities.
Some idea of the bounds of collaborations can be divined by reference to the total amounts that universities obtain from industry. But given the difficulties in obtaining a comprehensive and current view of increasingly important collaborations in their rich variety of forms, should the collection of R&D data from academia be extended to collect detail on types and amounts of collaborations?2 Can the data of interest be obtained in this instrument that are of good quality without creating undue burden on the respondents? The panel recommends that NSF consider the addition of periodic collection of information on industry-government-university collaborations as a supplemental inquiry to the survey of college and university R&D spending. A decision on the viability of this collection should be
preceded by a program of research and testing of the collection of these data (Recommendation 6.3).
In order to be fully relevant, the data must be available in a timely manner, so those who use them can compare them with other sources, aggregate them with other data to form a picture of the whole enterprise, and appropriately react to the information provided. The timeliness of the release of the academic survey data is slipping, not improving, over time. In the view of the panel, this renders the data less useful than they should be.
Some of the delays have to do with processing delays and other internal issues. The reporting forms are sent by mail and electronic means in November each year and are to be returned in January. A number of respondents need more time by virtue of the reporting processes and the timing of the fiscal year; they are offered an extension. This collection phase goes on until midsummer, when respondents who had not yet reported are asked to provide data only for items 1 and 2 (current R&D expenditures in S&E, by source of funds and by field of S&E). Total figures are asked for as a last resort. Respondents who still refused were thanked for their time and informed that NSF personnel might contact them in the future.
By the closing date of the 2002 survey in October 2002—some 9 months after the initial deadline for submittal of data—completed questionnaires had been received from 581 of the 610 academic institutions. These responses are tallied, data are edited, imputations are made, and estimates are prepared. Then the data are reviewed and prepared for release. These latter steps consume about half a year, so that the data are scheduled for release in April of the following year. In the natural order of things, the data would not appear until 18 months after the period of reference. NSF rarely meets that extended deadline.
Survey Methodology Issues
Several issues of survey methodology affect survey quality. In the survey design, all institutions in coverage with $150,000 or more in S&E expenditures reported in the previous survey are included with certainty. The frame is updated each year by comparing the previous frame for this survey with the list of institutions in coverage for the NSF-NIH Survey of Graduate Students and Postdoctorates in Science and Engineering and the list of academic institutions that receive federal S&E R&D funding as reported by federal agencies in the federal funds survey. The list of FFRDCs is maintained by NSF, and the list of historically black colleges and universities is maintained by the U.S. Department of Education. Thus, there is no
sampling, as such, but errors may arise with the multisource frame itself and the application of the criteria for inclusion. For example, the frame could easily fail to identify an academic institution with at least $150,000 in R&D if it was not identified in the previous collection or found in the comparison with the graduate student survey. The elimination from coverage of less than 4-year colleges is presumed to result in only a very small underestimate.
For the past two collection rounds, the survey has been collected in electronic format only. It has usually been disseminated in November and follow-up activities have taken place from January to July, so the period of intensively soliciting responses consumes over 6 months.
Little is published or analyzed about the characteristics of the respondents in the institutions, although it is known that there is some annual turnover. The experience and knowledge of the respondents are critically important in this survey, since the appropriate completion of this fairly complex questionnaire usually requires obtaining data from multiple sources within an institution, as well as full understanding of such concepts and definitions as field of science and engineering. To maintain a check on the quality of reporting, for several years NSF has engaged the services of an expert in the field to visit institutions to report on the methodology used in completing the questionnaires, as well as issues relating to concepts and definitions (National Science Foundation, 2001, 2002). Likewise, veteran respondents are assembled by NSF to participate in periodic workshops to identify items that are troublesome. Despite these selective efforts to better understand the reporting, the panel is concerned about the lack of profile information about the respondents and the limited training or retraining of these respondents as part of ongoing survey operations. On an ongoing basis, NSF should continue to contact a sample of responding institutions to check their records in order to improve understanding of the best means of gathering the data, the sophistication of reporting sources within an institution, and the interpretation of questions and definitions (Recommendation 6.4).
It was noted that, by the closing date of October 2002 for the 2001 survey, completed questionnaires had been received from 95 percent of the academic institutions, including 100 percent of the top 100 institutions and FFRDCs. As was the practice with the ORC Macro surveys, all missing data items, including those for nonrespondents, were imputed. No item nonresponse rates were reported, so it was not possible to assess the quality of the individual items in the report. The printed collection form appears to be quite busy and suffers from a lack of good graphic design. This has been less of a problem in the electronic format, in which questions are interspersed with directions, definitions, and reminders. Concerned about the lack of knowledge about the response patterns, the panel recommends study of the cognitive aspects of collection instruments and reporting procedures (Recommendation 6.5).
Imputation is a significant issue for this survey. In 2001, imputation was used to provide information for a small proportion of the survey population (4 percent) that did not provide information at all, as well as for item nonresponse. The imputation factors were generated by class of institution and derived from responding institutions for three key variables: total R&D expenditures, federally financed R&D expenditures, and total research equipment expenditures. These factors were used along with the previous year’s data. This methodology has led to large misspecification, especially when imputation is needed for a number of years. The estimates for basic academic research have been especially troublesome, since the response rate for this item has been in the range of 83 percent. In FY 2001, NSF needed to correct the “federal basic research” and the “total research” estimates by substantial amounts because of revisions in a large university’s basic research spending—a number that had been imputed for 15 consecutive years.
NSF has conducted promising research to improve the imputation procedures. A memorandum by Brandon Shackelford of NSF outlined a promising approach for utilizing a regression model for imputation of the basic research totals, which has been subjected to initial tests (ORC Macro, no date). Although the panel welcomes this research into a model-based approach to imputation, we are concerned that the tests were not sufficient to judge the soundness of the regression approach. The research should be redone utilizing a more standard procedure of withholding a set of independent data in order to test the model (Recommendation 6.6). The past practice of using imputed data for long periods as a basis for new imputes is particularly dangerous.
SURVEY OF SCIENCE AND ENGINEERING RESEARCH FACILITIES
The objective of the annual survey of Science and Engineering Research Facilities is to provide data on the status of R&D research facilities at research-performing colleges and universities, nonprofit biomedical research organizations, and independent research hospitals in the United States that receive funding from NIH. The survey results have many uses, although NSF has determined that the predominant users are planners in the federal government, state governments, universities, and private-sector firms. Importantly, it has congressional interest, having been mandated by Congress in 1985.
This survey is one of the most burdensome surveys in the government’s portfolio of surveys. Sometimes it is useful to put oneself into the shoes of the survey respondent when considering the appropriateness of a survey
undertaking. Imagine you are an academic administrator with responsibility for completing federal government inquiries, and on your desk arrives a survey from NSF asking for the following information:
The amount of space used for S&E research and how much of the space was for laboratories or offices.
Condition of the research facilities.
Costs of repairs and renovations in the last 2 years.
New construction in the previous 2 years with a project worksheet to be completed for each individual project.
Source of project funding for repairs and renovations and new construction.
Planned repairs, renovations, and new construction for the next 2 years.
Deferred repairs, renovations, and new construction.
Beginning in 2003, the survey goes onto a part 2 of the questionnaire, which contains new questions on computing and networking capacity. Questions asked are:
The physical infrastructure used for network communications.
Plans for future upgrading and uses of information technology.
Capacity for high-speed computations.
Infrastructure for wireless communication.
These exhaustive questions are garnered from a broad respondent base. All academic institutions granting master’s or doctoral degrees in S&E or other institutions that reported R&D expenditures of $1 million or more, as well as all HBCUs reporting any R&D expenditures, are included. The six service academies are not included. Nonprofit biomedical research organizations and independent hospitals that received at least $1 million in extramural research funding from NIH in the previous fiscal year are also covered in the population of interest. (The frame for the survey is defined by the academic R&D expenditures for the academic institutions and an NIH list of nonprofit research institutions and independent hospitals that received funding from NIH.)
Although the amount of information sought in this survey has expanded and contracted over the years, the direction has been to add more rather than less burden over time (see Box 6-4).
To its credit, the NSF does a great deal to mitigate the burden and surprise factor accompanying this data collection. Data collection proceeds in three distinct steps. First, the president or equivalent of each institution is sent a letter signed by the director of NSF or the director of the National
Center for Research Resources at NIH. This letter explains the survey and asks the president to name an appropriate person as the institutional coordinator (IC). A letter is also sent to the IC who had participated in the previous survey cycle, to notify the IC of the upcoming survey and the letter to the president.
The second phase is a mail-out of packages to the ICs, which contain a cover letter, an acknowledgment postcard to be mailed back, an overview of the survey, a paper copy of the questionnaire with a prepaid envelope, a list of frequently asked questions, and instructions for accessing the survey web site, with a user ID and a password. The NSF contractor, ORC Macro, maintains contact with the ICs about every other week.
The third phase consists of following up with the ICs. This means reminding them of due dates, answering questions, explaining how to use the web system, and so forth. About 75 percent of the ICs responded via the web.
NSF has also made several significant changes to sharpen the questionnaire. A cognitive study resulted in a rewording of the instructions and questions for greater clarity and some reformatting of the survey form. For example, some recent modifications include:
The description of what was included in research space was expanded to eliminate incompleteness in the description.
Redundant information from the instruction and information pages was removed.
A choice of “not applicable (NA)” was added to the item asking for the amount of net assignable square footage used for instruction and research space, by S&E field (1a) and amount of the total research space for all S&E fields that was leased (1b). If a respondent answered “zero” to 1a, it was necessary to make “NA” a possible response for 1b.
The new survey questionnaire also allows respondents to use data from their previous survey cycle data if they are still accurate for the current year. If so, processing staff entered these data for them on the paper questionnaire, and a “preload” button for each item accomplishes the same end for those who report on the web. These measures simplify reporting, but they could introduce errors to the extent that respondents take an easy way out of digging out fresh answers each year (National Research Council, 2002a).
A simple, but useful measure of the burden of a data collection is often reflected in the willingness of respondents to respond in a timely manner. Response rates for this survey have hovered around 90 percent for the academic institutions and 88 percent for the biomedical institutions. The top 100 (in terms of R&D expenditures) among the academic institutions had a 96 percent response rate; the private colleges had a lower but still acceptable 86 percent response rate.
Several reasons for these differences in nonresponse are postulated:
Public institutions are accustomed to having their data public. Private schools do not have that tradition.
Biomedical institutions are less likely than universities to have already collected these data in some form for a purpose other that responding to the survey. Also, these institutions are generally smaller than academic institutions. They may have received only one small grant from NIH. Thus, it is more of a burden to respond. Hospitals may not perceive the research grant to be as directly related to their main mission.
While it is tempting to suggest far-reaching changes to this admittedly burdensome survey, the panel notes that this survey has undergone extensive renovation in the recent past and has acceptable response rates from most survey respondent audiences. Since so many of the innovations in the questionnaire and in process automation are so recent, the panel recommends that the experience in the fielding of the revised questionnaire in 2003 be carefully evaluated by outside cognitive survey design experts, and that the results of those cognitive evaluations serve as the foundation for subsequent improvements to this mandated survey (Recommendation 6.7).
Survey Methodology Issues
The Survey of Scientific and Engineering Research Facilities is a biannual survey of academic institutions that includes a group of respondents not included in other NSF expenditure surveys—independent biomedical research facilities with NIH funding.
The eligible population has shifted somewhat over time. For example, the threshold for inclusion was raised from $150,000 to $1 million in 2003, and institutions granting master’s degrees were no longer automatically included in 2001. The accuracy of the identification of the universe depends on the accuracy of the Survey of Research Development Expenditures at Universities and Colleges for the academic sector. Previously discussed errors in reporting and imputation may affect the quality of the list. There is a possibility of double-counting in the case of overlapping coverage between the NSF and the NIH lists.
The key to success of this survey is the institutional contact at each surveyed institution. For the most part, the input data are not maintained centrally within the institutions, so each institutional contact must determine the most effective data collection approach, work with multiple internal sources of information, and review the data before submission. The institutional contacts are identified in an intensive campaign preceding survey mail-out. They are not necessarily located in one type of office, and many of them change from year to year.
The questionnaire is evolving, with major changes introduced in the past two survey cycles. Many of the changes, the direct result of a cognitive review of the questions, were introduced to provide greater clarity and to
remove redundancy. Our review indicates that this evolution process needs to continue.
The new design introduced with the web-based collection has increased the amount of data sought, introducing such questions as the identification of new construction in the previous 2 years, with a project worksheet to be completed for each individual project. Some of the concepts are new and possibly vague. For example, the new questions on computer technology and cyber infrastructure introduce new collection challenges, given the wide variety of institutional practices for computer and software procurement and inventory. The panel recommends that NSF continue to conduct a response analysis survey to determine the base quality of these new and difficult items on computer technology and cyber infrastructure, study nonresponse patterns, and make a commitment to a sustained program of research and development on these conceptual matters (Recommendation 6.8).
Even with these burdensome data inquiries, the overall response rate in 2001 was about 90 percent for academic institutions and 88 percent for the biomedical institutions. This is a testament to the institutional contact program and the determination of the data collectors. The differences in response rates between public and private institutions is of concern, with smaller rates for the private institutions perhaps the result of traditions and maybe the cause of larger error in their estimates. Some of these issues may be resolved in the current collection of data for 2003, when NSF will publicize data by institution, with only a few sensitive data items suppressed. The data should become more useful as benchmarks, and this procedure should also drive up institutional participation.
The survey employs web-based procedures, which require that all data items be completed prior to allowing submission of the form. This may force the respondent to enter doubtful data or to impute answers that are not obtainable from organizational records.
Imputation procedures vary for paper-based responses by whether or not the institution previously reported. If the unit previously reported, prior responses were used in the imputation procedure; if not, other methods were employed. The exact procedure used by NSF for imputation is not well documented, but it appears that imputation is used for unit nonresponse—a practice that is highly unusual in surveys. In most surveys, unit nonresponse is handled by weighting, as it was in this survey in 1999. At a minimum, NSF is urged to compare the results of imputation and weighting procedures (Recommendation 6.9).