Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 137
Page 137 APPENDIX D SUMMARY OF WORKSHOP On December 18-19, 2000, the Committee on Science, Engineering, and Public Policy (COSEPUP) sponsored a 2-day workshop on the Government Performance and Results Act (GPRA). The purpose of the workshop was to allow participants to summarize the points raised in five agency-specific focus groups 1 held over the previous 3 months, to review these points with representatives of the agencies and federal oversight groups, and to formulate their own conclusions and recommendations. This document summarizes the main points discussed at the workshop. This summary refers several times to the first GPRA report by COSEPUP.2 The executive summary of that report is included as Appendix E. It also reiterates the findings of the first report, with emphasis on the first four recommendations: research programs, including basic research, should be evaluated regularly; the methodology of evaluation should match the character of the research; the primary method for evaluating research programs should be expert review; and agencies should describe in their GPRA plans and reports the goal of developing human resources. 1 See Appendix C for summaries of focus groups with Department of Defense, Department of Energy, National Aeronautics and Space Administration, National Institutes of Health, and National Science Foundation. 2 COSEPUP, Evaluating Federal Research Programs: Research and the Government Performance and Results Act, Washington, D.C.: National Academy Press, 1999.
OCR for page 138
Page 138 Evaluating Basic Research The language of GPRA strongly encourages agencies to evaluate all their activities, including basic research, with quantitative metrics that can be applied annually. Much of the research in the large mission agencies, such as the Department of Energy (DOE) and the Department of Defense (DOD), is applied or developmental research, which is more amenable to quantitative measurement. But the panel heard from agency representatives that they could not find useful quantitative metrics to evaluate the results of basic research. Limits of Quantitative Metrics It is true that quantitative measures are used to evaluate researchers, research proposals, and research programs throughout science. Some of these measures are the number of publications, the number of times papers are cited by others, the number of invited talks given, and the number of prizes won. There was universal agreement, however, that the usefulness of such quantitative measures by themselves is limited. A citation index, for example, is a relatively crude measure in that it does not measure the originality of papers, the quality of publications, the number of co-authors, or other qualitative conditions that are fundamental to understanding their value. Many researchers publish large numbers of papers, each of which represents only a slight variation on the preceding one. Similarly, junior researchers who belong to very large research groups might play almost no role in the design of an experiment whose report they help to write. On the other side of the argument, “routine” papers might have greater value than first supposed. For example, a simple paper on methodology might contain an original insight into some technique that proves to be of great value. Discriminations of those kinds are best made by experts who are asked specifically to focus on the work of a particular person or laboratory.
OCR for page 139
Page 139 The Value of Expert Review The judgment of experts as a form of “measurement” has true value because of the reviewers' deep knowledge of a field and of the people who work in it. Several basic points were identified: Because basic research is an open and free inquiry into the workings of nature, the eventual significance or utility (“outcome”) of basic research cannot be predicted. For purposes of evaluation, one can evaluate basic research on the basis of whether it is producing high-quality knowledge (“output”) that is relevant to the mission of the agency supporting the work. Panelists offered several illustrations of the difficulty of trying to evaluate basic research annually with quantitative metrics. Quantitative evaluation can stifle the very inquiry it is trying to measure. For example, a researcher sets a measurable goal at the beginning of the year. In July, the researcher discovers a more promising direction and decides to alter course. The original goal is no longer meaningful. Even though the change of direction benefits the inquiry and the research program in the long run, the researcher would receive a low “GPRA grade” for the year on the basis of the original quantitative goal. The original language of GPRA encourages agencies to design their budgets in a way that links all expenditures with defined goals for that budget year. If an agency programs all its money at the outset of a budget cycle, it cannot move in a new direction when the promise of that direction is revealed. Some agency experiences A representative of the Environmental Protection Agency (EPA) described his agency's struggle to conform to GPRA. Some of the basic research supported by the agency, he said, does not
OCR for page 140
Page 140 easily align with goals expressed in the budget, because the results are unknowable or because the agency cannot show results within the budget year. A representative of the National Aeronautics and Space Administration (NASA) said that neither the Office of Management and Budget nor some of his own agency people seemed to appreciate the differences between NASA's basic research and other ways that the NASA mission is implemented. “All of what we do is not the same,” she said, “but they expect that our plans and reports are going to look the same.” A representative of DOE noted that as a manager he was not in a position to directly judge the science supported by the agency. “We are science-managing, not science-performing. There is a tenuous link between what we do in the office and the actual performance of science at a laboratory or university.” He said that science is performed according to its own standards of integrity and that the best science-management practices are those that will not have an adverse effect on science itself. Several scientists expressed surprise that they had to explain the process of basic research each time new members came to Congress. They suggested that agencies coordinate presentations to communicate about this issue with oversight bodies better. They also noted that once oversight groups recognized the basic principles of science, they might understand better why micromanaging of agency research programs does not necessarily lead to better science. Criteria for Evaluation In its first report on GPRA, COSEPUP recommended the use of the following criteria to evaluate research programs: the quality of the research, the relevance of the research to the agency's mission, and leadership—that is, the level of the work being performed compared with the level of the finest research done in the
OCR for page 141
Page 141 same field anywhere in the world. The criteria were discussed frequently throughout the workshop. Quality Agencies have all incorporated excellence into their evaluations. Individual research projects are evaluated for their quality by panels of independent experts in the same field of research. At a higher level, research programs are usually evaluated by panels of independent experts. However, there is considerable variation among agencies in how quality is assessed. The entire mission of the National Science Foundation (NSF) is research; a large, traditional structure of volunteer peer reviewers spends large amounts of time and effort in reviewing grant applications and grant renewals. In DOD, a very small proportion (1.5%) of the overall budget is dedicated to research, and this portion is peer-reviewed in the same manner as NSF-funded research by expert peers. Outcomes of most of DOD's work, however, which includes extensive programs of weapons testing, are more predictable, and the research component of the work is small. Instead of traditional peer review, the agency more often evaluates such work by marking its progress against established benchmarks. Because of such variations, the panel acknowledged that there are multiple approaches for gathering information on quality and for setting priorities and that agencies should be free to design their own approaches. There was also acknowledgment of how much work is asked of the science and engineering communities in serving on review panels. Both NSF and DOD, in particular, as well as the National Institutes of Health (NIH), are sometimes accused of reviewing too much and of overtaxing their reviewers. Relevance Panelists concluded that agencies generally have methods for gauging the relevance of their research to their missions. These
OCR for page 142
Page 142 methods, however, vary widely among agencies and are seldom made clear in GPRA plans or reports. Two views of relevance were discussed. The first is relevance as perceived by agency managers, who must decide what kinds of research are relevant to their mission objectives. The second is the view of the “users” of research. For example, the users of results of NIH research include pharmaceutical companies, hospital administrators, applied researchers, and doctors. NIH was asked whether it heard from such users, and a representative responded that the agency hears from them through its national advisory councils, which include scientists, health-care providers, and members of the public. The agency also holds workshops to gather general feedback. The example of the Army Research Laboratory (ARL) was discussed. ARL uses both external peer groups and user groups to evaluate its research. Peer committees are specifically designed to evaluate quality, but they are less well equipped to evaluate relevance. For that, expert committees must be augmented by members of the user community or experts in related fields. Relevance was conceded to be easier to assess for entities like ARL, where researchers work closely with those who will use the outcomes of research. It might be more difficult to describe for NSF and NIH, where most research is performed externally, users might be unknown, and most research is basic research. A panelist remarked that in assessing DOE research, users do have input, but it is seldom revealed in plans or reports. Leadership International benchmarking is the use of expert panels that include reviewers from both the United States and other countries to evaluate the leadership status of a country in a given research field. The goal of international benchmarking is to judge the “leadership level” of a program with respect to the world standard
OCR for page 143
Page 143 of research in that field. COSEPUP had written earlier that for the sake of the nation's well-being, the United States should be among the leaders in all major fields of science and be preeminent in selected fields of national importance.3 It was agreed that the agencies that focus on basic research, notably NSF and NIH, address the issue of leadership at least tacitly by funding their researchers competitively. By funding the best researchers, they are supporting the careers of the best scientists. But the leadership issue might be addressed more explicitly by including more foreign researchers on review panels. The discussants encouraged agencies to experiment with ways to increase the international perspective in their evaluation procedures. One panelist cited an earlier COSEPUP experiment with international benchmarking, in which the United States was deemed to be the overall world leader in materials science and engineering, but the study revealed some fields in which the United States was not ahead. “If I were sitting in an agency”, said the panelist, “that would worry me. The only way to get at this picture is through an international viewpoint.” DOE and several other agencies mentioned plans to experiment with international benchmarking. One agency representative cautioned that setting up such a program might take more time than is allowed in the framework of GPRA. Education The panel agreed that every agency that supports research has an interest in enhancing the education of graduate students, postdoctoral scientists, and active scientists. At the workshop, 3 The rationale for these complementary goals is that the nation must be performing research at the forefront of a field if it is to understand, appropriate, and capitalize on current advances in that field, no matter where in the world they occur. Cite Goals report.
OCR for page 144
Page 144 however, agency representatives seldom mentioned education in their presentations. The first GPRA report explicitly recommended the use of education as an evaluation criterion for purposes of GPRA compliance, and the panel reiterated this recommendation. Specifically, the panel recommended that the expansion or contraction of programs be assessed for effect on present and future workforce needs. Aggregation of Research Programs for Purposes of Evaluation Agencies support hundreds or thousands of individual research projects and they cannot evaluate all of them for the purpose of GPRA compliance. Therefore, they aggregate their projects to a large extent. Some agencies, such as the DOD, aggregate up to the program level; others, such as NSF, aggregate virtually all their projects into a single “research portfolio.” The topic of aggregation provoked extensive discussion, largely because a very high level of aggregation prevents insight into the evaluation of specific programs or divisions within programs. One criticism of high aggregation is that it is opaque to oversight bodies and others who want to understand how an agency makes decisions. An opposing view was that it is not appropriate for oversight bodies to “micromanage” agencies' selection and evaluation of individual programs or projects. A workshop participant noted that some committees do not use GPRA documents when research activities are too highly aggregated. She noted that the law requires research activities to be described at the program and financing budget levels. In NSF GPRA documents, she said, the existence of specific programs or disciplines is not apparent, and it is not possible to weigh activities against goals. “Congress would like to know what you were trying to do. It would like clearer statements of objectives and accomplishments.” Other participants indicated that it was risky to try and that agencies are often caught between conflicting desires, because
OCR for page 145
Page 145 some committees want to see a high degree of detail and others do not. A number of participants indicated that the degree of aggregation should be left up to the individual agencies. It was suggested, for example, that aggregation was easier in a mission agency, such as DOD, because its programs are more focused on specific goals, whereas NSF supports virtually all forms of research, which might not have predictable goals. Some also said that when agencies choose a high level of aggregation they should also make clear how decisions are made below that level and provide access to materials that demonstrate the decision-making. Even though not all committees will want to read through the long, highly detailed documents used by agencies for internal planning, these documents should be available. Another point made was that there has always been a tension between Congress and the scientific community about what kinds of research to pursue. In some cases, Congress would like to micromanage an agency's portfolio to pursue political or other nonscientific goals. In such cases, it is understandable that agencies prefer to shield their activities from decisions that can alter their program plans. One Size Does Not Fit All Another issue addressed at the workshop was the concept that “one size does not fit all.” One of the most striking examples of difference can be seen in the research supported by two agencies: NSF and DOD. Nearly all NSF's budget is spent on research, but only about 1.5% of DOD's budget is. And yet each has to respond to the same GPRA requirements, even though the overall DOD GPRA plan barely has space to mention research at all, let alone deliver a detailed analysis of planning and evaluation methods. An NSF representative said, “Our number one principle is to do no harm. One size doesn't fit all. If the shoe doesn't fit, it isn't your shoe.”
OCR for page 146
Page 146 A NASA representative explained the differences particular to her agency. At NASA, research activities are integrated across so-called enterprises, which are the major agency divisions. Each enterprise has a portion of a kind of research, and that portion must be integrated with the other activities of the enterprise, such as building hardware and planning space missions. It is difficult to explain the different qualities of scientific research within a GPRA document that comprehends an entire enterprise with all its diverse activities and goals. Another agency representative expanded on the difficulty of the large mission agencies. Because most of their activities are not research, the agencies themselves might not emphasize or even understand the research process. One representative pointed out that GPRA reporting in his agency is done through the chief financial officer. The Usefulness of GPRA The panel asked many questions about the utility of GPRA for agencies: What benefits, if any, does it bring to your agency? Some agencies saw benefit in being forced to examine management procedures more closely and to think in more detail about how their research activities served the objectives described in their budgets. Other agencies were still struggling to make sense of the GPRA requirements and to fit them to their agency's structure and function. For example, EPA expressed a “lot of dilemmas.” It felt a split between its overall mission and many of the science programs that supported that mission. Some of the programs supported basic research and could not be described annually in terms of outcomes—and yet both the oversight groups and agency administrators asked for such outcomes. Similarly, the US Department of Agriculture (USDA) described itself as “very mission-driven” but having core agencies that perform research. The representative felt
OCR for page 147
Page 147 that the most useful way to use GPRA was to apply it to program management, not to the research itself. DOE expressed the most profound difficulties. A representative said, “We're required by different people to meet different requirements not of our choosing. We're trying to come to grips with GPRA by focusing on budget—adopting a planning process that allows us to embed performance goals in the budgeting process that makes sense from the GPRA point of view. Now we have gotten instructions from appropriators to strip out all high-level goals and instead to use performance measures in line items as statement of what we're trying to accomplish—$2-3 million items, very specific. This is a big problem.” DOD said that GPRA has not added value to its evaluation process, because the agency is using the same procedures that it did before GPRA. It still evaluates the quality and relevance of research with a GPRA-like process. Some participants indicated that agencies do not appreciate the flexibility built into GPRA. That is, the law permits agencies to devise “alternative forms” of planning and evaluation when annual quantitative techniques are not appropriate. But some agencies that are expressing the most difficulty have not fully done so. GPRA and the Workload of Agencies The law does not allow agencies to hire additional staff or consultants to comply with GPRA, and its intent is not to impose an additional workload. But agency representatives described a considerable amount of extra workload in the form of meetings, workshops, and other activities. One representative offered an unofficial estimate that one-fourth to one-third of the time of some middle- and high-level officials was devoted to GPRA compliance. Some workshop participants also expressed concern over the amount of time devoted to GPRA. They felt strongly that it should not replace or interfere with how agencies do their strategic
OCR for page 148
Page 148 planning. But they were optimistic that once agencies moved farther along the learning curve, they would be able to integrate GPRA reporting procedures with internal agency procedures in ways that benefit both but do not require additional time. An NIH representative said that NIH had not had to change its internal planning or reporting procedures but felt that GPRA required special attention. She noted that GPRA work takes place in the context of other activities: planning, priority-setting, and producing other documents for 23 institutes and centers. For NIH as a whole, there are 55 strategic targets, only five of which had been discussed at the workshop. The rest—including training facilities, administration, grants, technology transfer, and priority-setting—are equally important parts of GPRA. An NSF representative said that GPRA “is expensive for our agency.” He said that the CFO, chief information officer, and many others all meet weekly to talk about it. The agency had to develop data systems to accumulate information for GPRA. It also affects the committees of visitors (COVs) that review NSF programs. The COVs used to study only the process of making awards and ensure that it was fair and honest. Since passage of GPRA, the agency has expanded the mission of COVs to evaluate the research results of past investments. DOD indicated that it had been able to integrate GPRA into its processes. The panelists noted that the extra effort would probably decrease as agencies developed systems that responded to GPRA more easily. They also urged oversight bodies to help agencies to develop reporting formats that minimize the extra effort required. Two issues of timing Linking performance plans with the budget cycle Most agency representatives reported difficulties in complying with the timing of GPRA requirements. They are
OCR for page 149
Page 149 required to send in their performance plans and performance reports with the annual budget. However, because budgets are due at the beginning of each year, sending in an annual performance plan with the budget requires preparation of the report before the year has ended. NSF explained its difficulties this way: “If you haven't written your performance report for 2000, how do you write your performance plan for 2003? It's an issue of how often we have these reports. The law has an artificial timeline that doesn't fit any of us. The performance plan could extend over a longer period than a year. It can't possibly hit what you're doing in the next budget cycle. And we can't factor what we've learned in that report into the next cycle.” A representative of the General Accounting Office (GAO) suggested that it would be hard to change the requirement for annual reporting but that the agencies can specify what they are reporting annually by using an alternative reporting form. She said that GPRA is more flexible than agencies recognize. Evaluating basic-research programs annually Like the focus groups, this workshop featured extended discussions on the difficulty of evaluating the results of basic research each year. Such a requirement, several participants said, puts unrealistic pressure on a principal investigator to come up with the “next great discovery of the last 12 months.” One participant noted that the “output” of good research is original knowledge, as measured by publications and perhaps new products, but that the “outcome” of that knowledge might be unknown for years. As a result, DOD now looks at every research program not annually, but every 2 years. Review panels are asked whether adequate progress is being made toward stated goals. NSF is planning to evaluate every basic-research program every 3 years, covering one-third of its portfolio every year. Thus, it is reporting on its programs every year, but reviewing a given
OCR for page 150
Page 150 program every 3 years. If there is an exciting discovery from a grant made 10-years previously, the discovery will be included after the year when it occurs. A USDA representative pointed out that an evaluation is more effective when it comprehends several years: “you can tell your story better.” But she said that evaluations can motivate people at the bench if they know that their results are expected and might be used by someone. Verification and validation Representatives of oversight bodies said that they would like more information about how agencies verify and validate their procedures for evaluation of research. For example, when an agency uses expert review to evaluate a program, who are the reviewers? How are they recruited? Are they all outside the agency? If not, when is the use of an internal panel justified? Are the users of research included in review panels? What qualifications are required? How are conflicts of interest avoided? What process do reviewers use? How good is the quality of the data that they are given? How much quantitative information is included? The absence of such information from most GPRA reports leads to some suspicion about the objectivity of expert review and the independence of reviewers. Panelists agreed that although most oversight bodies do not want to review the curricula vitae of all an agency's reviewers, for example, they feel reassured if that material is at least described in reports and made available as necessary. It might be reasonable for a committee to undertake a sampling of a given agency's procedures, for example, to obtain a better understanding of the evaluation process. This topic has not been thoroughly discussed between agencies and oversight bodies, however. Some participants indicated that the agencies themselves are best qualified to organize and validate their reporting procedures and that most of them have systems in place for doing so. The issue is not whether they should turn over the procedures or
OCR for page 151
Page 151 their validation to an outside body. They should simply be willing to describe the procedures in more detail than they do to provide a “sense of comfort” to the oversight bodies, which are primarily looking for an understanding of why agencies use the methods they use. “It's not a matter of right or wrong,” said one oversight official. “It's just for us to understand what they did and why they did it that way.” Communication Some agencies complained that oversight bodies issued conflicting requests, lacked consistency among personnel, failed to issue explicit guidelines, disliked new systems that were designed to comply with GPRA, and made unreasonable requests with regard to research activities, especially in large mission agencies. Some oversight personnel complained that agencies did not explain the special needs of science adequately, did not reveal their specific planning and reporting mechanisms with sufficient transparency, did not adequately align the program activities with budget line items, and did not explain their validation and verification procedures for evaluating research programs. One agency representative reported that he had never been contacted by a representative of Congress about GPRA. Some agency personnel were confused about why the appropriations committees did not seem to use or take an interest in agency GPRA reports. A congressional representative suggested that the level of aggregation of program activities was so high that committees could not understand or see the actual program activities. A GAO representative stated, “We know what agencies are doing, but is it good, bad, or indifferent? What has worked, and what are the problems? Has it been a successful experiment?” An EPA representative said that his agency had combined the budget with the performance plan, as suggested by GPRA, but that the “appropriators were upset that they weren't seeing what they were used to seeing” and asked for the old system. Other
OCR for page 152
Page 152 agency representatives reported the same difficulty with committees and sometimes with their own agencies. One agency representatives acknowledged that the process was still relatively young, and participants were still learning what the others wanted. “Three years ago,” he said, “everyone was in denial that we were going to have to do anything with this.”
OCR for page 153
Page 153 Workshop Participant List December 18-19, 2000 Panel Members: Enriqueta C. Bond President The Burroughs Wellcome Fund Research Triangle Park, North Carolina Alan Schriesheim Director Emeritus Argonne National Laboratory Argonne, Illinois John E. Halver Professor Emeritus in Nutrition School of Aquatic and Fishery Sciences University of Washington Seattle, Washington Wesley T. Huntress, Jr. Director, Geophysical Laboratory Carnegie Institution of Washington Washington, D.C. Louis J. Lanzerotti Distinguished Member of the Technical Staff Bell Laboratories, Lucent Technologies Murray Hill, New Jersey Rudolph A. Marcus Arthur Amos Noyes Professor of Chemistry California Institute of Technology Pasadena, California Stuart A. Rice [by confcall] Frank P. Hixon Distinguished Service Professor James Franck Institute Department of Chemistry The University of Chicago Chicago, Illinois Herbert H. Richardson Associate Vice Chancellor of Engineering and Director, Texas Transportation Institute The Texas A&M University System College Station, Texas Max D. Summers Professor of Entomology Department of Entomology Texas A&M University College Station, Texas Morris Tanenbaum Retired Vice Chairman and Chief Financial Officer, AT&T Short Hills, New Jersey Bailus Walker, Jr. Professor of Environmental and Occupational Medicine Howard University Washington, D.C. Robert M. White University Professor and Director Data Storage Systems Center Carnegie Mellon University Pittsburgh, Pennsylvania
OCR for page 154
Page 154 Participants: Mark Boroush Office of Science Policy Office of the Director National Institutes of Health Bethesda, Maryland Howard Cantor Performance Measurement Specialist US Environmental Protection Agency Washington, D.C. Ann Carlson Assistant to the NASA Chief Scientist NASA Headquarters Washington, D.C. Beth Foster Program Analyst Office of the Deputy Under Secretary of Defense (S&T) Washington, D.C. Harriet Ganson Chief, Planning, Evaluation and Legislation Branch of National Institute of Dental and Craniofacial Research, NIH Bethesda, Maryland Robin I. Kawazoe Director, Office of Science Policy & Planning National Institutes of Health Bethesda, Maryland Genevieve Knezo Specialist, Science and Technology Policy Congressional Research Service Library of Congress Washington, D.C. Sara Mazie US Department of Agriculture Washington, D.C. Jennifer McAndrew Policy Analyst National Institute of Standards and Technology Gaithersburg, Maryland Robin Nazzaro Assistant Director US General Accounting Office Washington, D.C. Nathaniel Pitts Director, Office of Integrated Activities The National Science Foundation Arlington, Virginia Joanne Spriggs Deputy Under Secretary of Defense (S&T) US Department of Defense Washington, D.C. John Uzzell Director, Division of Evaluation, Office of Science Policy National Institutes of Health Rockville, Maryland William J. Valdez Director of the Office of Planning and Analysis US Department of Energy Washington, D.C.
Representative terms from entire chapter: