Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 48
Data on Federal Research and Development Investments: A Pathway to Modernization 4 Toward a New Data Reporting System Based on Administrative Records As this report describes, the federal funds and federal support surveys conducted by the Division of Science Resources Statistics (SRS) of the National Science Foundation (NSF) have several weaknesses, including special challenges with regard to timeliness and quality.1 These difficulties stem, in part, from the complexity of the surveys. The federal funds survey, for example, asks the responding federal agencies to enter data on outlays for research and development for three years (actual for two years prior, prior year preliminary, and current year preliminary) and obligations for the same three years by categories of research and development (R&D), selected fields, types of research performers, specific federally funded research and development center, country of foreign performer, and state by type of performer. Although SRS has done much over the past few years to modernize the entry and transmission of the data, the surveys are burdensome for reporting agencies and, for many, do not reflect the reality of their R&D spending. There may be a better way to obtain data in the future. Statistical agencies across the federal government are building the capability to use data from administrative records maintained by program administration agencies. The individual records are typically applications or reports completed by individuals and institutions to meet mandated requirements to compete for 1 As defined by the Office of Management and Budget, quality is an encompassing term, incorporating utility (usefulness), objectivity (relating to the accuracy, reliability and lack of bias of the data), and integrity (making sure the data are protected from unauthorized access or manipulation) (U.S. Office of Management and Budget, 2002).
OCR for page 49
Data on Federal Research and Development Investments: A Pathway to Modernization awards or receive benefits, or for compliance, credit, tax, or other reporting. The use of administrative records on financial transactions, including grants, contracts, and other awards, is becoming a possible optional source of information on federal R&D spending, mainly as a result of new initiatives to improve data on federal spending across the government. These new initiatives promise to make administrative data much more accessible and to improve their quality. As these data sources are improved, they offer a way for SRS to improve the collection and dissemination of comprehensive and timely data on federal R&D spending. This chapter discusses the potential and limitations of the use of administrative data for collecting and compiling federal R&D spending data. It discusses past and current efforts to use the data and summarizes some initiatives that could change the way that agencies account for and report R&D spending. The requirements for a successful database and the challenges facing the SRS in developing this new system are then highlighted. Finally, a general plan for implementing a new vision for the federal funds and federal support data is outlined that specifically recommends planning for the transition from an all-survey to an integrated survey–administrative record approach, using demonstration projects to test various aspects of a possible transition to a system at least partly based on administrative data. THE ROLE OF ADMINISTRATIVE RECORDS AS A SOURCE OF STATISTICAL DATA The use of administrative records to substitute for or enhance surveys has been a goal for the federal statistical system for several decades. Particularly with regard to micro-level data from business entities, we can point to a number of highly successful examples of the development of information from administrative record sources (National Research Council, 2007b). Indeed, the increased use of administrative records has been recognized and documented since the early 1980s (Federal Committee on Statistical Methodology, 1980). However, the same reports that describe in glowing terms the potential of administrative records to provide detailed information at minimal cost, with an associated reduction in response burden in order to supplement or replace surveys, usually take pains to provide cautionary discussion as well. That is because problems of quality, consistency, and access have often plagued attempts to use administrative records for statistical purposes. Still, the practice of using administrative records appears to be advancing, significantly aided by advances in information technology. With increasing use of administrative records, there has been greater attention to timeliness and other aspects of the quality of the input data. Examples can be seen in recent initiatives to improve the federal government’s administra-
OCR for page 50
Data on Federal Research and Development Investments: A Pathway to Modernization tive records on grants and contracts, reflected in upgrades to the Federal Procurement Data System—Next Generation and the Federal Assistance Award Data System, as well as emerging cross-agency compilations of records that are not yet complete but could become fully fledged databases, such as www.grants.gov. (Each of these government-wide administrative data sources is detailed in Appendix B.) The E-Government Act of 2002 and the Federal Funding Account-ability and Transparency Act of 2006 (FFATA, Public Law 109-282) have the objective of systematically improving the contract and grant databases maintained by the U.S. General Services Administration and the U.S. Census Bureau, as well as to standardize, enhance, and validate the R&D spending data that reside in those databases. The Federal Funding Accountability and Transparency Act, in particular, is a wide-ranging federal law requiring the full disclosure of all organizations receiving federal funds. It provides legal backing for gaining more information about extramural federal funding, including R&D. The act requires the establishment of a single searchable website providing comprehensive information on all federal awards, to be populated by the Federal Procurement Data System, the Federal Assistance Award Data System, and www.grants.gov. The act also includes, in Section 2(b)1, a provision that the searchable website shall include for each federal award, “any other relevant information specified by the Office of Management and Budget.” These improvements have the potential for reporting on federal R&D spending at the project level and associating fiscal year obligations with such attributes as performer, performing institution, and geographic location. These and other attributes could serve as the foundation for SRS data collection efforts, which are also tied, in part, to fiscal year obligations and implicitly require the aggregation of project-level data into agency-wide data. Under these new laws, a supporting infrastructure is being developed across the government under the leadership of the Office of Management and Budget (OMB) that has the potential of improving the quality and timeliness of administrative records on government expenditures and, by doing so, to provide at least part of the data on R&D expenditures that are now extracted by means of surveys. Before these new initiatives, administrative records did not have the capacity to provide current or reliable information on federal R&D expenditures. For R&D spending data, the use of administrative records was first tested in the mid-1990s in a project called RaDiUS (Research and Development in the United States), which was developed by the Critical Technologies Institute at RAND for the Office of Science and Technology Policy. Although this project was discontinued in 2006, it illustrates the possibility of developing a data system to collect, store, and disseminate information on R&D expenditures from agency source documents, supplemented by
OCR for page 51
Data on Federal Research and Development Investments: A Pathway to Modernization independent, expert judgment. The RaDiUS database captured detailed data on federal R&D from agency records for the 24 agencies with the largest R&D expenditures. It accumulated not only records from the agency systems but also information from the Federal Assistance Awards Data System (FAADS) (Fossum et al., 2000). Although discontinued, RaDiUS taught valuable lessons about the quality of the contract and grant databases and how to approach development of a comprehensive system of information based on administrative data (National Research Council, 2005b, pp. 112-113). OUTLINE OF A NEW REPORTING SYSTEM BASED ON ADMINISTRATIVE DATA If SRS could ensure that public administrative records contain the required information fields, are recorded at the project level, are accurate, and are relatively easy to access, it could be confident in developing programs to collect and process administrative data instead of relying on surveys of agencies. This is a big order, and it does not describe the state of the various agencies’ administrative records at this time. The challenge for SRS is to ensure that the administrative databases include all relevant research spending (to include intramural spending); have records that are accurate at the source, perhaps as entered by a knowledgeable person, such as a principal investigator; and include all relevant classification variables, particularly field of science and engineering (S&E) and character of work. Moving from the current situation to one in which administrative data can be fully used for purposes of understanding federal R&D spending will not be simple, nor can it occur soon. It will require the development of means for ensuring accuracy, completeness, consistency, and compatibility with the analytical needs now fulfilled by the federal funds and federal support surveys. Accurate and Complete Data An administrative record–based data collection system will be of use to SRS only if those records include information in the categories SRS needs to collect. To fully portray R&D spending, the data should be suitable for collection and aggregation, if needed, to recreate the current data for R&D versus R&D plant, the character of work (basic research, applied research and development), and field of S&E. In addition, information on area of application and the identities of recipients of funds is needed for a full understanding of the nature of the R&D investment. Generally, administrative data today fail to meet these requirements. As recently as 2005, for example, the Government Accountability Office reported that users of the
OCR for page 52
Data on Federal Research and Development Investments: A Pathway to Modernization Federal Procurement Data System—Next Generation “lacked confidence in the system’s ability to deliver timely and accurate data on contracts” (U.S. Government Accountability Office, 2005). An important step toward improving the quality of administrative data on contracts and grants was taken with the publication and initial implementation of OMB guidance on data submission under the Federal Funding Accountability and Transparency Act. The guidance has the effect of increasing oversight of the data by establishing standards for a centralized system. This new administrative data system would retrieve data from selected systems in specified file formats, add data elements, specify requirements for timely reporting, and define quality assurance controls (U.S. Office of Management and Budget, 2006). Consistency Over Time Although the new system may ultimately offer more detailed and accurate data, it is critical for these new data to be comparable with the data from the current survey-based system. A major purpose of federal funds and federal support data is to enable analysis of trends in R&D spending, so that this spending can be connected with societal goals to help shape future patterns of spending in socially desirable ways. The ability to portray trends must be a feature of an administrative data-driven system for collecting the data necessary to report on R&D funding and expenditures. This ability would be especially critical during the time when shifting from the survey-based system to an administrative records-based system, and it could be enhanced by a period of dual publication of old and new data, widespread publicity, and full discussion of any apparent discrepancies. Buy-in and Support A successful administrative database depends on the support of those who are required to input, edit, manage, and evaluate the data. Although many of the current agency and government-wide databases are mandated by law, they rarely gain their success solely by virtue of such mandates. In addition, successful database systems are well understood by the stakeholders who can benefit from the data. For example, budgeting and fiscal accounting data systems are generally accurate and current because agencies have both incentives and requirements to keep them accurate and updated, and they perceive those data as being necessary for the success of their missions. The R&D contracts and grants in agency administrative record systems are beginning to receive that kind of attention, with a growing recognition that transparency is an important agency objective. The current administration has strongly supported this transparency in
OCR for page 53
Data on Federal Research and Development Investments: A Pathway to Modernization regard to R&D spending, mandating in the guidance to the federal agencies that, in preparing their 2011 budgets, agencies “have a responsibility to explain how Federal science and technology investments contribute to increased economic productivity and progress, new energy technologies, improved health outcomes, and other national goals. In order to facilitate these efforts, Federal agencies, in cooperation with the Office of Science and Technology Policy and the Office of Management and Budget, should develop datasets to better document Federal science and technology investments and to make these data open to the public in accessible, useful formats” (Office of Management and Budget and Office of Science and Technology Policy, 2009, pp. 2-3). ADVANTAGES AND DISADVANTAGES Advantages of a System Based on Administrative Data Administrative databases, if designed, managed, and implemented properly, would have some characteristics that can make them preferable to surveys as the source of information on R&D investments—although there would be challenges as well. This section discusses the accuracy, detail and flexibility, currency, and accessibility that are needed if administrative databases are to be useful for this purpose. Accuracy. The way in which an administrative database is constructed should serve to enhance the accuracy of the data obtained from it. In most cases, the data are entered by people who are in a position to know the topic. Principal investigators, budget officers, accountants, and project managers are more likely to know such information as the project’s field of science than a person in an agency budget office who has been assigned responsibility to complete the SRS federal funds or federal support survey forms and who may not know or understand the intellectual and scientific nature of a project. However, some effort will be needed to ensure that the database is free of errors and misreporting that can be caused by careless or quick entries by individuals and institutions. Detail and Flexibility. A potential benefit of administrative databases is that they typically are composed of the original, raw form of information, such as data related to a single research project. In the current system, budget officers and others who complete the federal funds survey do so by aggregating a variety of inputs to represent the agency’s R&D portfolio. Errors in compilation and aggregation can be made when the reports are being prepared. In
OCR for page 54
Data on Federal Research and Development Investments: A Pathway to Modernization contrast, project-level administrative information generally permits data users, including potentially the SRS staff, to produce tailored aggregations or link data from other administrative databases. Access to databases that offer data at the project level, with geographic and performer-level detail, enables ready aggregation of data to help answer the performer-type and performing-institution sections of the current surveys. Relying on administrative data instead of survey responses could enhance the ability to classify science in new and interesting ways. Currency. Administrative databases can be continuously updated. Under legislative guidance, agencies are now working to make the administrative data on grants and contracts more current, offering the possibility of dramatically shortening the time needed to obtain data on R&D spending and thus improving the timeliness of data. For example, the goals of the Federal Funding Accountability and Transparency Act legislation and guidelines are to ensure that grant and contract data are submitted within 3 days after the award and that the public database should be updated not later than 30 days after the award of any federal award requiring posting. Accessibility. The databases that are being developed under FFATA rules are being designed to ensure full transparency of all award actions by federal agencies in standard formats and thus would be more accessible to all users, including SRS, than the current survey databases. Moreover, because these administrative databases will be composed of an assortment of project-level data, rather than aggregations, users will be able to drill down or query according to their needs. These drill-down capabilities, using data mining and other sophisticated techniques, can greatly enhance the ways in which federal employees, budget experts, R&D specialists, and science and technology policy experts use these data for a variety of purposes. Thus, as it develops such R&D reporting systems, NSF should ensure that the public will have full access to them. If NSF is to pursue a new system based on administrative records, there are a number of hurdles to clear. Foremost are the myriad technical requirements to create a workable database driven by administrative data. This will require buy-in and support from the reporting federal agencies. NSF will need full agency cooperation in order to modernize the gathering of accurate data in a more organized and timely fashion. The reporting agencies will have work to do in order to more adequately integrate the various
OCR for page 55
Data on Federal Research and Development Investments: A Pathway to Modernization internal systems that report R&D, so their reports to NSF can provide a complete picture of R&D throughout the federal government. NSF does not need to face this task alone, however. The modernization and transparency of contract and award databases is a major federal government initiative, largely administered by OMB. The full weight and authority of OMB should establish an environment for improving administrative data and, through the administrative data, enhancing the transparency of federal spending. Disadvantages of Administrative Databases Currently, none of the contract and grant administrative databases discussed in this report provides the categories needed for direct reporting of federal funds or federal support data. These categories include R&D, R&D plant, character of work (basic research, applied research and development) and fields of S&E. In database management language, the administrative data systems do not currently contain the necessary “tags” (record descriptors) to permit extracting these sorts of data items. The current administrative data systems are defined for agency-related administrative purposes, and not for the statistical purposes of the federal funds and federal support surveys. This could lead to a continuation of the current problems that affect the surveys, such as the fact that definitions of data items with the same name vary among agencies and even within agencies. The problem of lack of coding for fields of S&E presents particular challenges. None of the contract and grant databases is organized in a way that would readily allow for the reporting of fields of S&E. For SRS to successfully transition from survey-based reporting of R&D activities to reporting based on administrative data, agency and government-wide databases should be able to associate each contract or grant funding record with descriptors of the work done under it, as described above. It may be possible to obtain fields of S&E information without the burden that would be incurred if a relevant data field were to be added to each record. For example, it may be possible to construct cross-walks between agency-relevant keywords (tags) that are used in project descriptions and the fields of S&E taxonomy. These fields of S&E tags could be drawn from the taxonomy, or they could be based on free text in cases in which no existing tag fits, for example, for newly emerging areas. Field of S&E tags could be automatically derived from the name of the funding agency or program, or they could be provided by the funded entity by means of investigator-supplied keywords on project proposals and descriptions. Text mining techniques might be applied to extract key terms or to group
OCR for page 56
Data on Federal Research and Development Investments: A Pathway to Modernization semantically similar funding records to speed up manual determination and assignment of fields of S&E. Recommendation 4-1: The Division of Science Resources Statistics, in cooperation with the Office of Management and Budget and the Office of Science and Technology Policy, should seek to have all federal agencies that fund or conduct research and development (R&D) to incorporate R&D descriptors (tags) into administrative databases. Ideally, in order to enable identification of the R&D components of agency or program budgets, tags should identify: the specific field of science and engineering; whether a record applies to R&D or R&D plant; and whether the record activity is basic research, applied research, or development. Most agency contract and grant databases capture only extramural awards, with the notable exception of the Research, Condition, and Disease Categorization (RCDC) system of the National Institutes of Health (NIH), which explicitly captures intramural R&D (see Box 4-1). For existing databases to be useful for SRS’s purposes, they would need to account for both extramural and intramural R&D. Intramural R&D is of particular importance to SRS reporting, since current data show that nearly one-quarter of federal R&D dollars are spent in intramural laboratories (National Science Board, 2006, p. 4-23). Intramural spending at the project, laboratory, or portfolio level will need to be incorporated into agency databases, perhaps following the approach used by NIH in populating the RCDC system with intramural as well as extramural project information, or it could be extracted directly from the parts of agencies that manage the intramural projects. Recommendation 4-2: The Division of Science Resources Statistics should work with the Office of Management and Budget to seek endorsement to work with other research and development funding agencies to incorporate intramural data into existing and future databases or to directly access intramural spending information from performer databases. Even after taking steps to identify R&D activities with some certainty and include both extramural and intramural projects, the thorny issue of accounting for classified R&D spending will remain. The spending on classified programs is an important part of R&D spending in some agencies, but these projects are not likely to be contained in administrative databases available to SRS or the public. This suggests implementation of a dual system based on administrative records for unclassified R&D supplemented by agency reporting of summary information for classified R&D.
OCR for page 57
Data on Federal Research and Development Investments: A Pathway to Modernization BOX 4-1 The National Institutes of Health Research, Condition, and Disease Categorization System The Research, Condition, and Disease Categorization (RCDC) system launched by the National Institutes of Health (NIH) is one of the promising administrative databases that could assist SRS in the transition from the current surveys to a new system for collecting federal funds and federal support data. The RCDC uses a computer database to sort NIH-funded projects into categories of research area, disease, or condition and allows these projects to be aggregated into annual reports on funding by category. The RCDC data are primarily used by Congress and the NIH Office of the Director to assess and evaluate NIH R&D spending priorities (Macro International, 2008, p. 58). The RCDC is noteworthy because it replaces an annual survey of funding by category that was sent to the 27 units that constitute NIH. The annual survey had required respondents in each unit to estimate three years of funding for 360 research (e.g., clinical research, minority health, nanotechnology) and disease categories (e.g., Parkinson’s, diabetes, cancer). NIH took the estimates from the 27 unit surveys to aggregate its total funding and spending. In many ways, the NIH annual survey closely resembles the current federal funds and federal support surveys because both methods asked separate units (or agencies) to report their spending behavior and then aggregated those self-reported data to obtain grand totals for R&D. According to NIH, the RCDC will allow the same budget data to be extracted automatically from the database without relying on decentralized surveys, which increase the threat of data entry errors and inaccurate estimates. The RCDC will allow NIH to consistently report how its research dollars are spent. The RCDC has the added advantage of containing data on intramural research, which is absent from most administrative databases. Thus, the RCDC has the potential of capturing all NIH research, extramural and intramural. The potential for RCDC to enhance federal funds data reporting is magnified by the size of the NIH research portfolio; NIH alone is now responsible for supporting half of all federal basic and applied research. The RCDC process involves creating category definitions for NIH, and currently there are 360 categories of which 215 are publicly reported in “Estimates of Funding for Various Diseases, Conditions, and Research Areas” (http://report.nih.gov/rcdc/categories). A category definition is a series of terms or concepts chosen from an RCDC thesaurus of more than 350,000 terms or concepts derived from various thesauri (from
OCR for page 58
Data on Federal Research and Development Investments: A Pathway to Modernization the Congressional Research Service, the National Cancer Institute, the Medical Subject Headings system of the National Library of Medicine, and Jablonski’s Dictionary of Medical Acronyms and Abbreviations) and in conjunction with the Collexis text mining/matching tool. These terms are then weighted by scientific experts to identify the relative significance of each term or concept to the category. The same scientific experts set a threshold for each category to determine the minimum number of times a term or concept must be mentioned in a project description to make the project eligible for a specific category. Periodically, scientific experts validate these categories. The RCDC system can then search all funded grants and contracts in the NIH database to create a project summary containing terms and concepts that match the RCDC thesaurus; it then compares each summary with the category definitions to determine how closely they match. If the RCDC summary meets the threshold set by scientific experts for a category, RCDC assigns that project to that category, which makes it possible for RCDC to display not only a list of projects in each category but also funding amounts. The system creates, for the first time, NIH-wide category definitions that are consistent across all NIH institutes to solve the problem of different institutes using their own definitions to respond to the current survey—the same problem that affects the SRS surveys. The system allows the category definitions to be applied uniformly to all types of research. The NIH unveiled the new RCDC system in 2009. For more on RCDC methodology, see http://report.nih.gov/rcdc/category_process/default.aspx and http://report.nih.gov/rcdc/faqs/default.aspx. IMPLEMENTATION OF A SYSTEM BASED ON ADMINISTRATIVE DATA The augmentation of current federal government-wide initiatives to provide basic information to identify R&D spending is a promising avenue for SRS to consider as it moves toward a mixed survey and administrative database system. In order to develop a flexible, administrative data-driven system for tracking federal funds and federal support data, the panel recognizes that SRS needs adequate authority and resources. The plan we outline in Chapter 6 for implementing the transition from a survey-based system to a mixed system of surveys and administrative databases requires that SRS staff work closely with agencies, OMB, and other relevant stakeholders
OCR for page 59
Data on Federal Research and Development Investments: A Pathway to Modernization in ongoing efforts to further develop e-government and federal spending database capabilities. Cooperative efforts alone may not lead to the inclusion of key variables, such as R&D identifiers and fields of science, into current and future administrative databases. SRS requires adequate budget resources and the full support of NSF management and OMB to participate in ongoing efforts to build in capabilities for collecting R&D survey data from current and future databases. The panel notes that it would be helpful to have congressional endorsement for a modernization of the federal funds and federal support program, even though no new legislative authority is required. NIH, for example, was assisted in building its RCDC system by an explicit requirement in the National Institutes of Health Reform Act of 2006 to build such a tool to categorize the agency’s research (Section 104 of Public Law 109-482). Outside organizations can play an important role as well. For example, two reports from the National Academies are claimed to have assisted in laying the groundwork for the RCDC system.2 OMB is the lead executive branch agency for collecting, organizing, and providing information on federal spending, and recent legislation mandates much of this data collection. In issuing the guidance for data submission under the Federal Funding Accountability and Transparency Act, OMB has shown a willingness to use its authority under the legislation to specify data fields. If the OMB guidance were extended to mandate identification of R&D awards, R&D versus R&D plant, character of work and field of science, these databases would be much more useful for understanding federal R&D spending. The FFATA databases already offer some promise in obtaining some of the data relevant to understanding R&D spending, although much work remains to make these data useful for SRS’s purposes. For example, www.USAspending.gov, the portal for the public to access the FFATA databases, allows users to generate detailed reports on external federal spending by performing institution, performer type, and geographic location. However, the website does not enable users to generate reports of federal spending by character of work or field of S&E, and it lacks information on intramural R&D. Furthermore, it does not distinguish between spending on R&D and on R&D plant. However, these databases appear to be the only existing cross-government databases that can meet both FFATA requirements and, potentially, SRS’s data needs. TRANSITION STRATEGY The new vision for the federal funds and federal support data outlined in this chapter will not be implemented overnight. Many of the precondi- 2 Available: http://report.nih.gov/rcdc/faqs/Default.aspx.
OCR for page 60
Data on Federal Research and Development Investments: A Pathway to Modernization tions for a successful conversion of the program from a survey-only to an integrated survey-administrative record approach are not yet in place. For example, the FFATA-enhanced administrative databases on contracts and grants are still maturing, and little work has yet been done with the major reporting agencies to set the basis for direct SRS exploitation of their administrative records. Several initiatives in the short term, however, would position SRS to effectively seize the moment when the preconditions for conversion of the program are in place. One approach would be to set up a series of demonstration projects to help determine good ways to transition to a system based at least in part on administrative data. The initial demonstration projects could be based on lessons learned by NIH in developing the RCDC system. With selected large reporting agencies, SRS could explore what would be necessary to develop agency-appropriate approaches to a more comprehensive system—in one set of demonstrations, using the current agency administrative databases to test mining for terms that could yield field of S&E taxonomic elements and, in another, perhaps testing the development of cross-walks between program/projects and fields. Such demonstration projects, conducted by the reporting agencies in conjunction with the implementation of government-wide administrative record improvement programs (and, one hopes, partially funded by those initiatives) could help illuminate the way to identify fields of S&E in data records at the program and project level, using the text-based technologies described in Chapter 5. Recommendation 4-3: The Division of Science Resources Statistics should initiate work with other federal agencies to develop several demonstration projects to test for the best methods to move to a system based at least partly on administrative records.