Appendix C
Modernizing the Infrastructure of the National Science Foundation Federal Funds Survey: Summary of a Workshop

As a key data-gathering activity, the Panel on Modernizing the Infrastructure of the National Science Foundation Federal Funds Survey hosted a workshop in Washington, DC, in September 2008. The first day of the workshop included presentations from four perspectives: (1) users of data from the federal funds and federal support surveys for research and development (R&D); (2) agencies that provide the data to the National Science Foundation (NSF) with responsibility for the surveys and other activities; (3) representatives of the U.S. Office of Management and Budget (OMB) with responsibility for overseeing government-wide implementation of the E-Government Act and other laws that are designed to improve administrative data; and (4) users of administrative data on grants and contacts, who focused on long-term opportunities to use federal government administrative data and other sources for measuring federal R&D spending. On the second day of the workshop, attention was directed toward issues associated with the classification of fields of science and engineering used in these and other NSF surveys. The workshop included presentations on emerging classification systems and a NSF staff presentation on the interface of the classification system used by NSF with other systems. (See the end of this appendix for the workshop agenda.)

This appendix summarizes the presentations and discussion at the workshop. Several of the presenters made suggestions and recommendations during the workshop; the panel considered them in the course of its work, but they are not included in this summary.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 84
Appendix C Modernizing the Infrastructure of the National Science Foundation Federal Funds Survey: Summary of a Workshop A s a key data-gathering activity, the Panel on Modernizing the Infra- structure of the National Science Foundation Federal Funds Survey hosted a workshop in Washington, DC, in September 2008. The first day of the workshop included presentations from four perspectives: (1) users of data from the federal funds and federal support surveys for research and development (R&D); (2) agencies that provide the data to the National Sci- ence Foundation (NSF) with responsibility for the surveys and other activi- ties; (3) representatives of the U.S. Office of Management and Budget (OMB) with responsibility for overseeing government-wide implementation of the E-Government Act and other laws that are designed to improve administra- tive data; and (4) users of administrative data on grants and contacts, who focused on long-term opportunities to use federal government administrative data and other sources for measuring federal R&D spending. On the second day of the workshop, attention was directed toward issues associated with the classification of fields of science and engineering used in these and other NSF surveys. The workshop included presentations on emerging classifica- tion systems and a NSF staff presentation on the interface of the classification system used by NSF with other systems. (See the end of this appendix for the workshop agenda.) This appendix summarizes the presentations and discussion at the workshop. Several of the presenters made suggestions and recommenda- tions during the workshop; the panel considered them in the course of its work, but they are not included in this summary. 

OCR for page 84
 APPENDIX C USER NEEDS Office of Science and Technology Policy The Office of Science and Technology Policy (OSTP), which was rep- resented at the workshop by Dianne DiEuliis, is a major consumer of data from the federal funds survey. OSTP’s mission is to advise the President and others in the Executive Office of the President on the effects of science and technology on domestic and international affairs. An important aspect of that mission is to lead interagency efforts to develop and implement sound science and technology policies and budgets. Data on federal expenditures for research and development play a central role for OSTP in its oversight and program coordination functions. In 2005, the OSTP director, John H. Marburger, III, explained the specific interest of OSTP in the data from the federal funds and other NSF surveys of research and development spending at the 30th Annual AAAS Forum on Science and Technology Policy in Washington, DC. He pointed out that indicators of the health of U.S. science are based on “indicators that are based on a data taxonomy that is nearly three decades old” and that “methods for defining data in both public and private sectors are not well adapted to how R&D is actually conducted today.” He referred to the previous National Research Council (2005b) study, which concluded that NSF R&D expenditure data are often ill-suited for the purposes for which they have been used and urged that the report’s recommendations for improving various components of the data and enhancing their usefulness “should receive high priority in future planning within NSF.”1 Jointly with OMB, OSTP prepares the administration’s statement of R&D priorities, reflecting input from the President’s Council of Advisors on Science and Technology (PCAST) and the National Science and Technol- ogy Council (NSTC). That statement provides general guidance for setting priorities for agency R&D programs: Significantly, it defines a set of initia- tives for which budget and expenditure data should be collected. The most recent guidance defined six areas of highest priority for R&D investments: (1) homeland security and national defense, (2) energy and climate change technology, (3) advanced networking and information technology, (4) a national nanotechnology initiative, (5) complex biological systems, and (6) the environment.2 The identification of priority areas leads to a demand for data that will measure the status of implementation of the investment priorities. In terms 1 Available: http://www.aaas.org/news/releases/2005/0421marburgerText.shtml [accessed February 2009]. 2 Available: http://www.whitehouse.gov/omb/memoranda/fy2007/m07-22.pdf [accessed March 2009].

OCR for page 84
 DATA ON FEDERAL RESEARCH AND DEVELOPMENT INVESTMENTS of specific requirements for data from the federal funds survey, OSTP made a strong case for the maintenance of historical data in order to preserve information on spending trends in detail. Congress Congress plays an active role in generating a demand for information on federal funds and support, and, in its oversight role, pays close atten- tion to the management and direction of the NSF data collection efforts. For example, the federal support survey has been mandated by Congress since 1950.3 James Wilson, then majority staff director of the Research and Science Education Subcommittee of the House Committee on Science and Technology, spoke on the interests of that committee, which has key autho- rization jurisdiction over the NSF portfolio of R&D surveys. Although there is an interest in R&D spending data by field of science and engineer- ing with consistency over time in order to understand trends in funding, the interest of Congress also often has to do with program categories—such as those defined by the administration’s investment priorities—rather than specific fields. For example, Wilson said the committee would like to be able to see data on the projects that support the nanotechnology initiatives, but cannot do so because the field’s information is not sufficiently granular. In addition, much of the R&D activity of interest is hidden in the “not elsewhere classified” classification, and the combination of mathematics with computer sciences is also too broad for policy makers. Wilson said the committee would like more information on collaborative research and the nature of the performers of the research. Wilson also expressed a need for data that are be compiled and pub- lished in a timely manner. Due to the legislative calendar and the budget cycle, congressional committees need fiscal year expenditure information within 6 months of the end of the fiscal year. Currently, the relevant data are not available in time to have meaningful input into the authorization and appropriation processes except in retrospect. As an arm of the Congress, the Congressional Research Service (CRS) responds to members of Congress and the congressional committees. In meeting the requirements of Congress for objective and impartial analysis, CRS publishes periodic reports on trends in federal support for R&D, as 3 The National Science Foundation Act of 1950, as amended, requires that the National Science Foundation “initiate and maintain a program for the determination of the total amount of money for scientific and engineering research, including money allocated for the construction of the facilities wherein such research is conducted, received by each educational institution and appropriate nonprofit organization in the United States, by grant, contract, or other arrangement from agencies of the Federal Government, and to report annually thereon to the President and the Congress.”

OCR for page 84
 APPENDIX C well as reports on special topics in R&D funding. Both types of studies rely heavily on data from NSF, both as originally published and as summarized in publications such as Science and Engineering Indicators. John Sargent of the CRS, who has written several studies with empha- sis on categories of R&D expenditures, talked about his recent study of nanotechnology as an example of how the data are used. By defini- tion, nanotechnology crosses several fields of science and engineering and represents considerable complexity. The multidisciplinary nature of nanotechnology spending is not depicted in the regular NSF data, so CRS relies on special data calls for its information. CRS analysis is also limited by the large “not elsewhere classified” classification: It is believed that it includes many of the growing and emerging research areas of considerable interest. To obtain a full picture of R&D spending, CRS needs data from the support survey on facilities and infrastructure maintenance spending in addition to the investment data. Professional Societies, Associations, and Public Interest Groups Users in the community of professional societies, associations, and public interest groups were represented by Robert E. Gropp, senior public policy representative of the American Institute of Biological Sciences (AIBS). AIBS is an umbrella society for 87 professional biological science societ- ies whose 240,000 members study every sub-discipline of the biological sciences, including botany, ecology, taxonomy, evolution, and agricultural sciences. Gropp expressed concern with the treatment of the biological sciences in the NSF taxonomy of fields of science and engineering. There is also a tendency to lump basic and applied sciences together, making it difficult to identify the evolution of R&D from basic to applied research to development. Gropp also discussed the need for data to shed light on the growing multidisciplinary category of service science (also known as service sci- ence management and engineering). Service science is an interdisciplinary approach to the study, design, and implementation of service sector systems. This is a growing academic discipline and research area that is character- ized by the application of computer science, cognitive science, economics, organizational behavior, human resources management, marketing, and operations research in support of understanding aspects of the service sector. Like nanotechnology and other categories that lump together vari- ous disciplines, the emerging service science field is very difficult to measure with current NSF data.

OCR for page 84
 DATA ON FEDERAL RESEARCH AND DEVELOPMENT INVESTMENTS National Science Board The National Science Board is the body that provides oversight for and establishes the policies of the National Science Foundation, within the framework set by the President and Congress. The board also serves as an independent body of advisers to both the President and Congress on broad national policy issues related to science and engineering research and education. The board is responsible for preparing the biannual Science and Engineering Indicators report, which provides a broad base of quantitative information about U.S. science, engineering, and technology for use by public and private policy makers and makes extensive use of the informa- tion from the federal funds and federal support surveys. Louis Lanzerotti, the chair of the board, sent a letter to the panel in connection with its work; the letter was made available to the participants in the workshop and is summarized below: As Chairman of the National Science Board’s Subcommittee on Science and Engineering Indicators, I am an enthusiastic supporter of improvements in R&D data resources, as are other members of the Board. Although I cannot participate in person, I applaud efforts to improve the quality and utility of the Federal Funds for Research and Development survey, an important data resource for the Board’s Science and Engineer­ ing Indicators 00 report in Chapter 4, “Research and Development: National Trends and International Linkages,” and in Chapter 5, “Aca- demic Research and Development.” In order to contribute to the Workshop’s discussions, I would like to refer you to the Board’s conclusions on data resources for Federal R&D budget allocation decisions as stated in its 2001 study and report, Federal Research Resources: A Process for Setting Priorities (NSB 01-156), which may be helpful in this current study. Although the Board’s 2001 report addresses data resources for Federal R&D budget allocation decisions, I believe some of the conclusions are relevant to your more focused exami- nation of the Federal Funds survey. These conclusions are paraphrased as follows: • Improving Federal budget data and data systems requires a long-term commitment and appropriate support from OMB and Congress. • Input from potential users and contributors are needed. • Data must be made easily accessible to users. • Definitions of research activities must be consistently applied across Federal departments, agencies and programs and measured to capture the changing character of research and research needs. • Flexibility in defining categories of research for tracking purposes is especially important for monitoring emerging research areas and addressing the range of modes for research—from individual investi- gator to major center or facility.

OCR for page 84
 APPENDIX C ISSUES OF DATA PROVIDERS Data providers in the federal government were represented by the National Institutes of Health (NIH) and the U.S. Air Force. In addition, the panel heard from a representative of the Centers for Disease Control and Prevention (CDC), a major unit of NIH. The NIH reports the agency’s R&D expenditures directly to NSF, while the Air Force reports through the Department of Defense (DoD), and the CDC reports through NIH. Thus, the panel was able to gain an appreciation of the concerns of both direct reporters and those who report through other agencies. Israel Lederhendler, the director of the Office of Extramural Research at NIH, detailed several concerns with the NSF data collection program. The NIH manages its R&D portfolio at the project level and, for its own management and reporting purposes, aggregates projects into categories centered around research areas, diseases, and conditions that are not easily described in the NSF taxonomy of science and engineering fields. Further- more, as a matter of policy, NIH classifies its program as “medical science,” given the output of the R&D expenditures, and aggregates its reporting to that level even though other fields are clearly represented in the agency R&D program. The agency is focused on outcomes, but the taxonomy is organized around inputs. Lederhendler said there is also a problem caused by having to force- fit projects with multiple disciplines in a single category. The “fitting” is highly subjective, and so the numbers can change over time because of reporting changes rather than real changes in disciplines. A solution would be to add metadata (information about the data) to project descriptions. The metadata would describe all aspects of a project and serve as the basis for coding to the various reporting requirements. Such content-rich project descriptions could be maintained on a system like research.gov, which could serve as a portal for a federated system. Large R&D agencies like NIH are further challenged because of the requirement to report on many surveys with different definitions. The fed- eral funds and federal support surveys, for example, have different defini- tions of fields. Lederhendler suggested the need for a federated system of information and ontology. NIH is now developing the Research, Condi- tion, and Disease Categorization (RCDC) system—a prototype system for common categorizing and reporting for both intramural and extramural research that could be a step toward the needed federated system. Lederhendler said that several other issues raise questions about the quality of the data that agencies provide to NSF. For example, different grant and contracting practices, including the lag between the award and the payment, affect expenditure data. There is also no good information about international R&D expenditures, so these data in the NSF report may

OCR for page 84
0 DATA ON FEDERAL RESEARCH AND DEVELOPMENT INVESTMENTS be questionable. As a remedy, he suggested that NSF convene a standing advisory group composed of reporting agencies to provide input to NSF on reporting issues. His colleague, Robin Wagner, formerly of CDC, raised additional con- cerns. She stated that there is no agency information on performer by location. Since the data are not automatically available, CDC must issue an internal data call for the information, which delays submission of the reports and possibly adds error to the information. As a practical matter, the coding in CDC is done mainly by budget specialists, not scientists, so the coding may not be informed by the scientific purpose of the work. The Department of the Air Force was represented by Tom Russell, director of Aerospace, Chemistry, and Materials for the Air Force Office of Scientific Research. He reflected the view of a manager of R&D programs in a large, decentralized system and the consequent difficulties for report- ing in the manner prescribed by NSF. Within the DoD, the maintenance of information about R&D projects and outcomes is the responsibility of the Defense Technical Information Center (DTIC), which provides centralized information on DoD scientific, technical, engineering, and business-related work. DoD research agencies, both the policy and program agencies and the procurement system and laboratories that support them, are geared to reporting in the DoD system, while the annual NSF requirement is an outside reporting requirement and only marginally relates to the internal reporting system. Russell is encouraged by several initiatives toward integrating defini- tions, classifications, and reporting requirements across the government. The Chief Financial Officer Act is increasingly improving the integrity of government financial data, while the Federal Funding Accountability and Transparency Act (FFATA) and the performance reporting and evaluation system that has evolved in response to the Government Performance and Results Act promises to provide a common language and a common basis for reporting in all federal agencies and should lead to an ability to integrate reporting across the government. SCIENCE OF SCIENCE POLICY: METRICS The science of science policy, first proposed by OSTP, has been institu- tionalized in NSF with a Science of Science Policy Program that is expected to use the data provided by the federal funds and support surveys. This information will be needed in response to new requirements for metrics to assess the progress of explanatory models, analytic tools, and datasets designed to inform the nation’s public and private sectors about the pro- cesses through which investments in science and engineering research are transformed into social and economic outcomes.

OCR for page 84
 APPENDIX C Julia Lane of the NSF’s Directorate for Social, Behavioral, and Eco- nomic Sciences (SBE) described this new initiative. A key aspect of the initiative is to develop new and improved metrics, datasets, and analytical tools. NSF has solicited input on several questions of substance (how fields of science and engineering are defined and if they are changing), whether the critical input measures (basic research, applied research and develop- ment) are appropriate, and the identification of critical output measures. Her presentation stressed that the way to deal with the lack of metrics is tied to more extensive and intensive use of administrative record data, which have provided answers in other research, areas such as understanding business dynamics and the nature of work. QUALITY AND CONTENT ISSUES Against this backdrop of unfulfilled user needs and producer concerns, NSF collects, processes, and publishes the only source of information on federal expenditures for R&D based on the federal funds and support surveys. John Jankowski, the program manager for the surveys in NSF’s Division of Science Resources Statistics, talked about the administrative and technical aspects of the surveys, underscoring the current strengths and practical limitations of the surveys, as a basis for considering possible survey changes. The use of the term “survey” may be a bit of a misnomer, Janowski said. The surveys are essentially censuses of federal R&D spending: The federal funds survey, which covers all known federal agencies that fund R&D (both in-house and external); and the federal support survey, which covers the fed- eral agencies that account for almost all federal R&D support to academic institutions. Thus, coverage is clearly a strength of the surveys. Content is also very robust. The federal funds survey collects aggregate totals to performer sectors (e.g., R&D obligations to all universities and colleges combined), and has both obligations and outlays–the only source of such information. The federal support survey collects totals on all sci- ence and engineering obligations, including R&D, by federal agencies to institution-specific academic and nonprofit institutions. There is also great depth to the published detail. The federal funds survey publishes outlays for total R&D and R&D facilities data for 3 years—the past year (actual), the current year (preliminary), and the next or budget year (projected)—and by funding agency and performing sector—federal intramural, industry, universities, nonprofit institutions, individual federally funded research and development centers (FFRDCs), nonfederal governments, and foreign performers. These data are further classified by R&D work category—basic research, applied research, devel- opment or R&D plant—and by detailed science and engineering fields.

OCR for page 84
 DATA ON FEDERAL RESEARCH AND DEVELOPMENT INVESTMENTS The location of the performer is also published at the state and foreign country level. The federal support survey publishes data for the immediate past year for 19 departments and agencies on obligations to nearly 1,200 individual universities and colleges (as of fiscal 2006), broken down by R&D; R&D plant; fellowships, traineeships, and training grants; facilities and equip- ment for instruction in science and engineering; general support for science and engineering; and other activities related to science and engineering. Within the academic sector, totals can be derived for historically black colleges and universities, high Hispanic enrollment institutions, minority serving institutions, and tribal colleges and also by public or private aca- demic institutions and for 1,323 individual independent nonprofit institu- tions (as of fiscal 2006). Janowski noted that the surveys are neither large nor particularly expensive (as federal government recurring surveys). NSF has collected data from about 60 reporting entities in recent years, yielding published data for about 90 agencies for the federal funds survey and for 19 agencies for the federal support survey. The survey costs for fiscal 2007, the most recent year available, were $450,000 for the federal funds survey and $420,000 for the federal support survey. Collection is a relatively straightforward operation, with the majority of the agencies using the FEDWeb reporting tool, and most of the others reporting by providing electronic data files. Janowski said that timeliness is an issue. The survey is introduced to the field in February of each year, covering spending in the prior year. The due date is usually mid-April, but, in the past several years, some agencies did not submit their data until November or December. Since NSF has a policy of not publishing the totals until all agencies have reported, data were not released until February of the following year, one year after the surveys were sent out and more than a year after the end of the reporting time. Due to these delays in publication, the data for the most recent 2 years are preliminary and projected, which tends to create a false sense of time- liness. The data that were released in February 2008, for example, had preliminary data for fiscal 2006 and projected data for fiscal 2007. The delays also introduce a type of error, since there are sometimes significant differences between the preliminary, projected, and final estimates. In 8 of the last 9 years, budget year projections were higher than the final obliga- tions. In some years, the differences between the first published (projected) and final estimates have varied by 5 percent or more. Jankowski’s presentation listed several shortcomings in the current data. The key data gaps are for federal R&D laboratories, for which there is undercounting of internal versus external R&D, and international science and technology activities, which only identify R&D by foreign location, not performer.

OCR for page 84
 APPENDIX C IMPROVEMENT OPPORTUNITIES A three-person panel explored several recent initiatives that could yield long-term opportunities for more extensive use of federal government admin- istrative data and ancillary data sources for measuring federal R&D spending. The panel was comprised of Andrew Reamer, a fellow with the Metropolitan Policy Program at the Brookings Institution, representing a nongovernment public policy research; Mark Bussow of the Office of Management and Budget; and Jeffrey Alexander, representing a private-sector firm that uses fed- eral data on R&D spending in supporting economic development strategies. Reamer provided a summary of the development of congressional mandates to provide information on federal spending. He summarized and critiqued eight interrelated mandates for collecting, organizing, and provid- ing information on federal spending. Several of the mandates place OMB in a central role. There are four primary information and data repositories: the Catalogue of Federal Domestic Assistance (CFDA), the Federal Assistance Award Data System (FAADS), the Federal Audit Clearinghouse (FAC), and the Federal Procurement Data System (FPDS). Two mandated reports require gather- ing accurate and timely program information—the Information of Federal Assistance to State and Local Governments System (known as 31 USC 1112 (f) system after its legislative mandate) and the Consolidated Federal Funds Report (CFFR). Under the designation of secondary data reposito- ries, Reamer discussed the now defunct RaDiUS repository and the FFATA mandate. He suggested that the RaDiUS system could serve as a prototype for the FFATA effort. Mark Bussow reported on OMB activities with regard to improving the quality of and access to administrative data that would be useful for measuring federal R&D spending. Under the Federal Funding Accountabil- ity and Transparency Act of 2006 (Public Law 109-282), OMB was given responsibility for establishing a publicly available online database contain- ing information about entities that are awarded federal grants, loans, and contracts. The act was to be implemented in two phases. The first phase called for a new database to provide information on entities (corporations, associations, partnerships, sole proprietorships, limited liability companies, limited liability partnerships, states, and localities) that are awarded funds directly from the federal government by January 1, 2008. The second phase called for information on subgrantees and subcontractors that receive funds from a primary recipient by January 1, 2009. The database would provide the following information: • name of entity receiving award • amount of award

OCR for page 84
 DATA ON FEDERAL RESEARCH AND DEVELOPMENT INVESTMENTS • type of award (e.g., grant, loan, contract) • agency funding award • a North American Industry Classification System (NAICS) code of the recipient or a Catalog of Federal Domestic Assistance (CFDA) number (if applicable) • program source • award title that describes the purpose of the funding • location of recipient • city, state, congressional district, and country in which award per- formance primarily takes place • unique identifier for entity receiving award and of the parent entity of recipient, if one exists • any other information specified by OMB OMB has elected to leverage existing systems, functionality, and avail- able data to the fullest extent and has selected three major financial assistance databases as sources of information for the new website: the Federal Procure- ment Data System (FPDS)—Next Generation (NG), the Federal Assistance Award Data System (FAADS), and Grants.gov. FPDS and FAADS are known to have serious data problems—being incomplete and untimely and having inaccurate entries—so a high priority initially has been to clean up these databases at the source (U.S. Office of Management and Budget, 2007). For access, OMB selected a private “watchdog” organization, OMB Watch, to participate in launching the website, “Fedspending.org,” which provides public access to information on federal grants and contracts as mandated under the act. Bussow said that although OMB is making good progress toward meet- ing the goals and objectives of the FFATA, this is seen as a long-range pro- cess. A full, timely, and accurate database, meeting the needs of multiple users with the ability to substitute for data calls and other data collections, is still perhaps a decade away, but the process has a sense of direction now. Jeffrey Alexander made the case for timely, accessible, and model-based data for understanding regional innovation. The data are needed for inter- nal analysis, benchmarking, and decisions on investing public funds. The federal R&D funding data are used for correlation analysis (to understand the relationship between R&D funding and patents to determine if there is a clustering effect) and to facilitate innovation and commercialization by identifying, recruiting, and retaining researchers and connecting collabora- tors. This analysis is hindered by several factors: a mismatch of the need for aggregated budget reports for program administration and data for local pur- poses; data quality issues, such as data entry errors, incorrect categorization, missing or incomplete records and lack of timeliness; lack of data integration; and outdated, incoherent, irrelevant, and inconsistent taxonomies.

OCR for page 84
 APPENDIX C Alexander reported on the kind of analysis that was and could be done with RaDiUS-type of data. Keyword searches and other techniques were able to identify a large number of R&D contracts, grants, and other activities that were associated with very specific technologies. However, the keywords tended to be very brief project descriptions that were not at all standard, nor did they cover all potential uses of the information. The same is true of FAADS and FPDS files. He suggested some policy changes that could improve the quality and usability of the R&D spending data: the use of triangulation to identify and correct errors; better enforcement of consistent reporting policies; unified format standards and data architectures; and increased use of machine anal- ysis. However, he did not support a unified taxonomy of fields of science and engineering because advances in information technology have created an environment in which multiple taxonomies are supported by such new technologies as text analysis, concept inference, and evolving semantic web programs. The taxonomies most useful for economic development analysis would be self-organizing and self-correcting, which would require comput- ing power, intensive design effort, and commitment of resources. TAXONOMIES The second day of the workshop was devoted to discussing issues of taxonomy of fields of science and engineering. Beginning this discussion was Gretchen Gano, librarian for public administration and government information at New York University, who elaborated on the state of clas- sification science and suggested a way of considering the science and engi- neering classification of the future. The assumption underlying classification is that there is a hierarchy based on origins, which can be natural or human invented. The classifica- tion structure for science and engineering is now standardized, comparable over time, and descriptive, though it is difficult to translate at the bound- aries (such as with multidisciplinary fields). In reference to a recent study (Cheney and Park, 2005), she pointed out that interdisciplinary, multi- disciplinary, and transdisciplinary fields have emerged that are not well represented in the taxonomy. This has led, for example, to “not elsewhere classified” fields being larger than their peer disaggregated categories. Gano’s definition of interdisciplinarity was taken from a report of the National Research Council (2005a, p. 26): “a mode of research by teams of individuals that integrates information, data, techniques, tools, perspec- tives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or area of research practice.” There is a process that has emerged as fields go from

OCR for page 84
 DATA ON FEDERAL RESEARCH AND DEVELOPMENT INVESTMENTS overlapping to becoming interdisciplinary toward a new field, supported by an infrastructure of academic departments, defined grants, journals, and subject headings. Gano concluded that NSF should move away from hierarchical fields and subfields toward discipline-spanning classifications of the key elements of scientific practice. As an example, she cited the New York University PRIsm MUlti-object Survey (PRIMUS), a wide-field survey to advance the study of the structure of the universe. It integrates phenomena, data, theory, method, and practice in an integrated system. She suggested that science and engineering fields, on a pilot basis, could be viewed as clusters of attributes that could be mapped to standard disciplinary taxonomies. The information retrieval would be aided by semantic web technology, using the resource description framework (RDF) structure that describes and interchanges metadata on the web. An example of the application of the RDF structure is DBpedia, which extracts structured information from Wikipedia and links to other datasets. Reporting on the results of an internal review of taxonomy issues, Geri Mulrow of NSF discussed general issues of the collection of taxonomy data and discussed the interface of the NSF taxonomy with other systems. She has been responsible for an internal staff study of taxonomy issues that has reviewed prior Division of Science Resources Statistics (SRS) reports, interviewed NSF division directors and program officers, and interviewed outside researchers. The SRS Division has published several studies on the taxonomy since 2000. It has also conducted a workshop on OMB directive No. 16 (Cheney and Park, 2005), as well as sponsoring the present workshop. The 2004 workshop concluded that classifications that describe the dynamic science and engineering fields need to be revised periodically and that criteria and procedures are needed for the classification scheme. The updates should be based on input from the disciplines and respondents and data users, understanding that different disciplines view the same topic from different perspectives. However, users also want consistent data and categories over time, so there is a tension between updating the classification structure and continuity. The main alternative schemes were the NSF classification embodied in OMB Directive No. 16, the Classification of Instructional Programs (CIP), the Standard Occupational Classification (SOC) system, the Frascati Manual, and the National Research Council taxonomy. The studies found general agreement at the major field level but inconsistencies at the subfield level. All reported issues with inter- and multidisciplinary fields. Mulrow’s work found that the set of principles for classification that underscored each of the schemes were based on the uses of the classification, all had

OCR for page 84
 APPENDIX C guidelines for determining how to code units, and all were generally hier- archical in structure. Her discussions with NSF program divisions led to her conclusion that research is becoming increasingly inter- and multidisciplinary in nature and will continue so in the future. The new and growing areas of research are generally identifiable, at the boundaries of disciplines. Professional associa- tions are a good source of information on emerging trends in the fields; many of them periodically reorganize to accommodate new fields. However, the educational system has generally lagged behind the research community in coming to grips with increasingly multidisciplinary activities. Another study Mulrow summarized was a 2008 report on the S&E taxonomy, based on interviews with responding federal agencies (Macro International, 2008). The purpose of this study was to gain an understand- ing of the strengths and weaknesses of the current fields of science and engineering taxonomy used in the federal funds survey; identify the use of the taxonomy across the agencies; and detail the current process used by agencies for allocating and managing their research funds and how they report them to NSF. The findings were instructive and gave a cause for concern. The largest R&D funding agencies do not use the OMB/NSF fields for program management and budgeting, and there is little consis- tency across the agencies in the fields that they use to track their R&D. The agencies do not use the OMB/NSF fields because they do not relate to their programs, they fail to capture inter- and multidisciplinary research, and they are generally not useful management tools. A good deal of staff judgment is used in coding the fields, and sometimes coding is by formula (percent distribution) or by computer techniques. Agencies tend to manage more by program categories (energy, environment, disease) than by field and can report the categories more readily than the fields. Mulrow’s internal study led her to conclude that it would be useful to build on the current report and gain a deeper understanding of the actual R&D programs in the agencies and to start with a few of the largest agen- cies to maximize return on the investment. The review of agency program management should include considering the existing agency administrative record systems and linkage mechanisms. As for the issue of modernizing the taxonomy, Mulrow suggested starting with the development of principles and guidelines for a classification system, reviewing the need for multiple classifications of the data, considering network representations of the infor- mation, and, in preparation for updating the taxonomy, developing ways to bridge the past with the current with the future classification system.

OCR for page 84
 DATA ON FEDERAL RESEARCH AND DEVELOPMENT INVESTMENTS WORKSHOP AGENDA Workshop on Modernizing the Infrastructure of the National Science Foundation Federal Funds Survey September 5-6, 2008 Room 101, Keck Center 500 Fifth Street, NW Washington, DC 20001 Objectives of Workshop: 1. To explore issues involved with the NSF federal funds and federal support surveys. 2. To learn about user needs for federal R&D expenditure information. 3. To understand federal agency data sources for federal R&D expenditures. 4. To consider short- and long-term changes in the federal funds and federal support surveys. 5. To consider the use of administrative data under the E-Government and Transparency Acts to provide information on federal R&D spending. 6. To consider issues with the taxonomy of fields of science and engineering. 7. To initiate preparation of the final report with recommendations. Friday, September 5, 2008 Open Session 8:30-9:00 a.m. Welcome (Continental Breakfast served) Christopher Hill, Chair, George Mason University 9:00-10:30 a.m. Overview of User Requirements Diane DiEuliis, Senior Policy Analyst, Office of Science and Technology Policy, Executive Office of the President James Wilson, Majority Staff Director, Research and Science Education Subcommittee, Committee on Science and Technology, U.S. House of Representatives John Sargent, Congressional Research Service

OCR for page 84
 APPENDIX C 10:30 a.m.-12:00 p.m. Challenges Facing Data Providers Israel Lederhendler, Director, DIS, Office of Extramural Research, National Institutes of Health Tom Russell, Director of Aerospace, Chemistry, and Materials, Air Force Office of Scientific Research 12:00-1:00 p.m. Working Lunch 1:00-2:30 p.m. Strengths and Limitations of Federal Funds/ Support Surveys John Jankowski, National Science Foundation Using Administrative Data to Estimate Federal R&D Expenditures Julia Lane, National Science Foundation 2:30-2:45 p.m. Break 2:45-4:00 p.m. Focus on Long-Term Opportunities to Use Federal Government Administrative Data and Other Data Sources for Measuring Federal R&D Spending Mark Bussow, Office of Management and Budget Andrew Reamer, Brookings Institution Jeff Alexander, New Economy Strategies 4:00-5:00 p.m. Open Discussion Saturday, September 6, 2008 Open Session 8:00-9:30 a.m. Issues with the Taxonomy of Fields of Science and Engineering (Continental breakfast served) Gretchen Gano, Librarian for Public Administration and Government Information, New York University

OCR for page 84
00 DATA ON FEDERAL RESEARCH AND DEVELOPMENT INVESTMENTS 9:30-9:45 a.m. Break 9:45-10:30 a.m. Issues with the Collection of Taxonomy Data; Interface of S&E Taxonomy with Other Systems (CIP, SOC, etc). Jeri Mulrow, National Science Foundation