Page 7

CHAPTER 1

THE CHALLENGE OF EVALUATING RESEARCH

Passage of the Government Performance and Results Act (GPRA) in 1993 reflected a desire on the part of the public and their representatives in Washington for more effective and efficient use of public funds. GPRA requires a heightened degree of accountability in the planning, performance, and review of all federally funded activities.

The fraction of the United States budget invested in scientific and engineering research is relatively small, but it is highly visible, extremely important to the nation's future, and subject to lively debate. Federal funds support a total of some $20.2 billion 1 worth of basic research in 1998; about half that amount goes to the National Institutes of Health (NIH). 2 About $50 billion more is spent on applied research and development, of which a large portion is devoted to the procurement and testing of weapons systems. In all, the public investment in defense, health care, environment, space exploration, and other research-based endeavors constitutes a substantial public commitment.

In return for that investment, the public rightly expects substantial returns in the form of recognizable and useful outcomes. GPRA, as applied to scientific and engineering research, translates that expectation into a requirement for regular evaluations of


1 National Science Board. 2000. Science and Engineering Indicators. Text Table 2-1.

2 The next-largest recipients are the National Aeronautics and Space Administration (NASA), the Department of Energy (DOE), and the National Science Foundation (NSF), which are each allocated about 12% of federal funding for basic research.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 7
Page 7 CHAPTER 1 THE CHALLENGE OF EVALUATING RESEARCH Passage of the Government Performance and Results Act (GPRA) in 1993 reflected a desire on the part of the public and their representatives in Washington for more effective and efficient use of public funds. GPRA requires a heightened degree of accountability in the planning, performance, and review of all federally funded activities. The fraction of the United States budget invested in scientific and engineering research is relatively small, but it is highly visible, extremely important to the nation's future, and subject to lively debate. Federal funds support a total of some $20.2 billion 1 worth of basic research in 1998; about half that amount goes to the National Institutes of Health (NIH). 2 About $50 billion more is spent on applied research and development, of which a large portion is devoted to the procurement and testing of weapons systems. In all, the public investment in defense, health care, environment, space exploration, and other research-based endeavors constitutes a substantial public commitment. In return for that investment, the public rightly expects substantial returns in the form of recognizable and useful outcomes. GPRA, as applied to scientific and engineering research, translates that expectation into a requirement for regular evaluations of 1 National Science Board. 2000. Science and Engineering Indicators. Text Table 2-1. 2 The next-largest recipients are the National Aeronautics and Space Administration (NASA), the Department of Energy (DOE), and the National Science Foundation (NSF), which are each allocated about 12% of federal funding for basic research.

OCR for page 7
Page 8 Basic Research and Applied Research As a search for the unknown whose outcomes are virtually unlimited, research defies exact definition. Intellectually, it is apparent that the performance of research takes place across a continuum of thought and action, from the abstract reasoning of a single individual to a multi-billion-dollar program of technological complexity, such as a mission to Mars. However, to satisfy administrative or intellectual needs, it has often been convenient to separate “basic” research from “applied” research. In that spirit, basic research is often thought of as an unfettered exploration of nature whose only required output is new knowledge and whose outcomes are unknowable in advance. Applied research might be described as an activity whose outputs are also new knowledge, but knowledge whose nature and use are explicitly needed to achieve a specific useful outcome. 3 Any research process is complex and has many feedback loops. A question raised during “applied” research might kindle a “basic” question that leads to new fundamental understanding. The knowledge “output” of basic research might—often after years or even decades—find utility as a practical “outcome.” For example, some of Louis Pasteur's most fundamental understandings about microbiology grew out of practical attempts to control spoilage in beer and wine. In contrast, a knowledge-seeking study in basic research can lead to a discovery of great practical value. The atomic phenomenon of stimulated emission identified by Einstein in 1917 led eventually to the laser light that carries our e-mail today along fiberoptic lines. 4 In managing and funding research, it is important to understand the open-ended possibilities of any research activity, no matter how it is categorized, and to encourage the freedom of inquiry that leads beyond what is already known. Any imagined distinctions between “basic” and “applied” research are less important than this unimpeded freedom to follow one's intuition and evidence in the service of improved understanding. In practice, research managers must have the insight to balance the need for predictable results with the desire for unexpected breakthroughs. 3 For example, a research effort to make an amplifier by using semiconductors did not succeed. It was suggested that something might be happening on the surface of the semiconductor that interfered with the desired result. A basic study of the semiconductor surface began, which led to the discovery of the transistor effect. 4 The National Academies publish Beyond Discovery: The Path from Research to Human Benefit, a series of articles that describe applications of basic research that could not have been anticipated when the original research was conducted. The series, published four to six times per year, is available on the National Academies Web site, www.nationalacademies.org/beyonddiscovery.

OCR for page 7
Page 9 federal research program performance and public disclosure of the results. Similarly, the public's representatives in Congress expect from agencies a sufficiently clear explanation of agencies' research activities to allow them to set priorities and manage agency budgets. Congress's desire for simplified and understandable information about research programs is reflected in the act's requirement of planning and reporting mechanisms. Federal agencies that support research have moved by stages toward full implementation of GPRA over the last 4 years, with the central objective of providing a regular accounting of their research activities. They have spent substantial staff time designing ways to adapt their procedures to the act and have provided extensive plans and reports about their procedures and achievements (see Appendix G). Nonetheless, both the agencies and oversight bodies have wrestled with interpreting, implementing, and communicating about GPRA. This report attempts to examine the agencies' progress toward meeting objectives, discusses some of the problems encountered, and recommends several actions intended to benefit all parties. Because of the complexity of responding to GPRA and because the methods used by federal agencies are still in early stages of development, the panel decided to focus its effort on creating an accurate picture of the processes being developed rather than on its specific mechanisms. To achieve that, the panel used a series of focus groups in which the agencies shared their experiences in creating their performance plans and reports, and representatives of oversight bodies provided their perspective and interacted with agency representatives. The focus groups were followed by a workshop and supplemented by numerous interviews with agency personnel and oversight groups. Of 11 agencies that support research in science and engineering, five were chosen for in-depth examination: the

OCR for page 7
Page 10 Department of Defense (DOD), Department of Energy (DOE), National Aeronautics and Space Administration (NASA), National Institutes of Health (NIH), and National Science Foundation (NSF). Together, these five agencies account for some 94% of the federal government's spending on basic research. The remainder of this chapter summarizes COSEPUP's first report on the issue of evaluating federal research programs. This report entitled, Evaluating Federal Research Programs: Research and the Government Performance and Results Act recommends that federal research programs be evaluated using a process called expert review and the criteria of quality, relevance, and leadership. 1.1 Barriers to Evaluating Research and theSolution The difficulty of using measurements to evaluate research arises because the purpose of research is to provide knowledge and better understanding of the subject under study. For example, research in physics is aimed at a better understanding of the laws of nature that govern the behavior of matter and energy. A specific case is research into those materials that become superconducting at low temperatures. The eventual outcome of such work might be knowledge about synthesis of materials that are superconducting at room temperature. Practical outcomes would be new classes of electronic devices and high-efficiency motors and power-transmission systems. However, those outcomes might not occur for many years. Indeed, research might demonstrate that such materials cannot be made—also a valuable result that would save us from the futile pursuit of such outcomes. Because we do not know how to measure knowledge while it is being generated and when its practical use cannot be predicted, the best we can do is ask experts in the field—a process called expert review—to evaluate research regularly while it is in progress. These experts, supplemented by quantitative methods, can determine whether the knowledge being generated is of high quality, whether

OCR for page 7
Page 11 Terms of the Government Performance and Results Act GPRA requires agencies to produce three documents: a strategic plan, a performance plan, and a performance report. A strategic plan must cover a period of at least 5 years and be updated every 3 years. The performance plan and performance report must be submitted annually. The performance plan must list specific performance goals for the fiscal year of the budget it accompanies. Agencies are required to relate their performance goals to the broader objectives of the strategic plans and to specific activities described in the annual agency budget. The plans must establish performance goals for each program activity, and these goals must be expressed in an “objective, quantifiable, and measurable form.” The performance report is intended to be included in each agency's “accountability report,” due 6 months after the end of the fiscal year. For many government activities—such as the provision of benefits to a segment of the population, the construction of a highway, or the implementation of a new service—the setting of performance goals and the annual assessment of progress are conceptually straightforward. That is, they are able to list their performance goals in quantifiable terms and report on their progress toward those goals by using specific metrics and time lines. For research activities in science and engineering, however, especially those involving basic research, it is difficult or impossible to know the practical outcomes of activities in advance or to measure their progress annually with quantifiable metrics or milestones. Although it is desirable to use traditional measures of scientific excellence—including publications in refereed journals, frequency of citations, patents, honors and awards from professional associations—such measures apply most usefully to individuals rather than groups, and they offer only limited perspective on the likely outcome of entire programs. The difficulty of predicting outcomes presents challenges both to agencies whose primary mission is research, such as NSF and NIH, and to the research components that are often relatively small parts of mission agencies. The deeper reason for the difficulty is embedded in the nature of research itself, as discussed in the box on “ Basic Research and Applied Research.” Accordingly, the act allows an “alternative form,” as approved by the OMB, for agencies that do not find it feasible to express their performance goals in quantitative form. A number of agencies have experimented with alternative forms, with mixed reviews on achieving GPRA requirements. Agencies are still seeking effective response mechanisms that both they and oversight groups find useful. To a large extent, the primary source of difficulty is the complex nature of research itself.

OCR for page 7
Page 12 it is directed to subjects of potential importance to the mission of the sponsoring agency, and whether it is at the forefront of existing knowledge—and therefore likely to advance the understanding of the field. Expert review is a well-understood and widely applied technique that is used by congressional committees, in various other professions, by industry boards, and throughout the realm of science and engineering to answer complex questions through consultation with expert advisers. Virtually all science and engineering programs in federal agencies, universities, and private laboratories use at least some expert review to assess the quality of programs, projects, and researchers. Expert review is more than traditional peer review by scholars in the field. It also includes the users of the research, whether they are in industry, nongovernment organizations, or public health organizations or any other members of the public who can evaluate the relevance of the research to agency goals. This report does examine other mechanisms for analyzing research, including bibliometric analysis, economic rate of return, case studies, and retrospective analysis. All methods were found to have some utility, but the people best qualified to evaluate any form of research are those with the knowledge and experience to understand its quality, relevance, and leadership and, in the case of applied research, its application to public and agency goals. Furthermore, in many research programs, progress toward outcomes is not reflected in outputs that can be measured in a single year. In such cases, the value of the work might appear as an accumulation of discrete steps or sometimes abrupt insights that require two, three, or even more years to emerge. So a retrospective analysis over a number of years is necessary. For other research programs, progress toward specified practical outcomes can be measured annually with milestones and other quantitative approaches common in industry and some parts of the federal government.

OCR for page 7
Page 13 For any long-term research program, results can be described annually—given a clear understanding of the research process. In the example of the search for room-temperature super-conductors, one might expect such first-year results as drafting a request for proposals, evaluating responses, and funding the best of them. The research results themselves would begin to emerge in the middle years of such a program, and the interpretation of results and outcomes would emerge in the last years and perhaps be accompanied by planning for more-distant outcomes. The point is that the process is distorted if one expects to evaluate only the research results of a program for any given year of a long-term effort. 1.2 COSEPUP's Evaluation Criteria COSEPUP proposed three evaluation criteria that should be used during the expert review process: quality, relevance, and leadership. These are described in more depth below. 1.2.1 Quality. Review of the quality of research via peer review is the most common form of expert review. Peer review is applied throughout the scientific and engineering communities to the work of laboratories and individuals. All the agencies involved in the focus groups said that they use it to evaluate programs. Because one's professional peers are uniquely familiar with the standards, context, history, and trends of a field, they are uniquely qualified to assess the quality of a research endeavor and to recommend improvements. The sine qua non of quality review is objectivity. Oversight agencies want more evidence that the personal connections or histories of reviewers do not influence their opinions of the institutions or individuals under review. That concern is legitimate and forms the basis of the custom of seeking out panels that are not only expert, but also independent, in a professional sense, of the object of review. Expert review must be carried out by individuals who

OCR for page 7
Page 14 have technical expertise in the subject being reviewed but who are professionally independent of the program under review. Although it is true that those who are qualified to do quality reviews have some loyalty to the field, their potential bias is balanced by the strong tradition of honesty in the review process. 1.2.2 Relevance. Relevance review is conducted by panels of expert peers joined by experts in related fields, potential users of the results of research, or other interested members of the public. Advisory committees are typically asked to answer the question, Does the agency's research address subjects in which new understanding could be important in fulfilling the agency's mission? The goal is to evaluate the relevance of a research program or project to the agency's goals. User communities are taken to consist of those for whom agency research is intended to be relevant, including members of the academic and private sectors. For example, federally supported health research is assumed to benefit patients, medical practitioners, pharmaceutical companies, and other groups that use the results of research to develop new therapies and new products and to reap the benefits of new cures. It is important that these users help to evaluate the research “product” they hope to use. At the same time, it is essential to choose user groups with care so that they understand the need for the community's broad interests and do not focus too narrowly on single issues. 1.2.3 Leadership. Review of leadership was proposed in the first COSEPUP report as a potentially effective evaluation criterion to test whether research is being performed at the forefront of scientific and technologic knowledge on an international level. In its Goals report of 1993, COSEPUP wrote that for the sake of the nation's well-being, the United States should be among the leaders in all major fields of science and pre-eminent in selected

OCR for page 7
Page 15 fields of national importance. The rationale is that the nation must be performing research at the forefront of a field if it is to understand, appropriate, and capitalize on current advances in the field, no matter where they occur. 5 Review of leadership is a new but promising means to gauge the place of a nation's research programs. Review can be accomplished by the technique of international benchmarking; an exercise carried out by a panel of non-US and US experts whose technical expertise and international perspective qualify them to assess the standing of a research program or an entire field. They are asked to assess the relative position of US research today, the expected relative position of US research in the future, and the key factors influencing relative US performance. The premise for using the leaders of a research field is that they are in the best position to appraise the quality of researchers in their field, to identify the most promising advances, and to project the status of the field into the future. As an experiment, COSEPUP panels performed international benchmarking in three fields—mathematics, immunology, and materials science and engineering—and found it to be faster and less expensive than procedures that rely entirely on the assembly of quantitative information, such as numbers of dollars spent, papers cited, plenary lectures delivered at international congresses, and scientists supported. 6 The panels also found good correlation between the qualitative judgments of experts and the results of quantitative indicators. In addition, panels concluded that quantitative measures by themselves are inadequate indicators of leadership, both because quantitative information is often difficult to obtain or compare across national borders. Also, quantitative information generally 5 COSEPUP. 1993. Science, Technology, and the Federal Government: National Goals for a New Era. 6 COSEPUP. 2000. Experiments in International Benchmarking of US Research Fields.

OCR for page 7
Page 16 illuminates only a portion of the research process. In other words, numbers of papers, patents, or citations should be used as indicators of the generation of innovative technologies, but they do not by themselves necessarily illuminate the most promising or important activities in a field. An experiment in mathematics by NSF that produced results similar to COSEPUP's mathematics study also lends credence to the benchmarking technique despite differences in the makeup and mandates of the two panels. 7 1.3 Organization of this Report In Chapter 2 the panel provides its assessment of methods being used by agencies to comply with GPRA. Chapter 3 discusses some difficulties in communication between agencies, oversight bodies, and the public about GPRA. Chapter 4 provides the panel's general conclusions and recommendations. 7 National Science Foundation, Report of the Senior Assessment Panel of the International Assessment of the US Mathematical Sciences, Arlington, VA, March 1998.