Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
APPENDIX D Summary of Workshop On December 18-19, 2000, the Committee on Science, Engineering, and Public Policy (COSEPUP) sponsored a 2-day workshop on the Government Performance and Results Act (GPRA). The purpose of the workshop was to allow participants to summarize the points raised in five agency-specific focus groups1 held over the previous 3 months, to review these points with representatives of the agencies and federal oversight groups, and to formulate their own conclusions and recommendations. This document summarizes the main points discussed at the workshop. This summary refers several times to the first GPRA report by COSEPUP.2 The executive summary of that report is included as Appendix E. It also reiterates the findings of the first report, with emphasis on the first four recommendations: research programs, including basic research, should be evaluated regularly; the meth- odology of evaluation should match the character of the research; the primary method for evaluating research programs should be expert review; and agencies should describe in their GPRA plans and reports the goal of developing human resources. 1See Appendix C for summaries of focus groups with Department of Defense, Department of Energy, National Aeronautics and Space Administration, National Institutes of Health, and National Science Foundation. 2COSEPUP, Evaluating Federal Research Programs: Research and the Government Performance and Results Act, Washington, D.C.: National Academy Press, 1999. 137
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH The language of GPRA strongly encourages agencies to evaluate all their activities, including basic research, with quantita- tive metrics that can be applied annually. Much of the research in the large mission agencies, such as the Department of Energy (DOE) and the Department of Defense (DOD), is applied or developmental research, which is more amenable to quantitative measurement. But the panel heard from agency representatives that they could not find useful quantitative metrics to evaluate the results of basic research. It is true that quantitative measures are used to evaluate researchers, research proposals, and research programs throughout science. Some of these measures are the number of publications, the number of times papers are cited by others, the number of invited talks given, and the number of prizes won. There was universal agreement, however, that the usefulness of such quantita- tive measures by themselves is limited. A citation index, for example, is a relatively crude measure in that it does not measure the originality of papers, the quality of publications, the number of co-authors, or other qualitative conditions that are fundamental to understanding their value. Many researchers publish large numbers of papers, each of which represents only a slight variation on the preceding one. Similarly, junior researchers who belong to very large research groups might play almost no role in the design of an experiment whose report they help to write. On the other side of the argument, âroutineâ papers might have greater value than first supposed. For example, a simple paper on methodology might contain an original insight into some technique that proves to be of great value. Discriminations of those kinds are best made by experts who are asked specifically to focus on the work of a particu- lar person or laboratory. 138
Workshop Summary The judgment of experts as a form of âmeasurementâ has true value because of the reviewersâ deep knowledge of a field and of the people who work in it. Several basic points were identified: ⢠Because basic research is an open and free inquiry into the workings of nature, the eventual significance or utility (âout- comeâ) of basic research cannot be predicted. ⢠For purposes of evaluation, one can evaluate basic research on the basis of whether it is producing high-quality knowledge (âoutputâ) that is relevant to the mission of the agency supporting the work. Panelists offered several illustrations of the difficulty of trying to evaluate basic research annually with quantitative metrics. ⢠Quantitative evaluation can stifle the very inquiry it is trying to measure. For example, a researcher sets a measurable goal at the beginning of the year. In July, the researcher discovers a more promising direction and decides to alter course. The original goal is no longer meaningful. Even though the change of direction benefits the inquiry and the research program in the long run, the researcher would receive a low âGPRA gradeâ for the year on the basis of the original quantitative goal. ⢠The original language of GPRA encourages agencies to design their budgets in a way that links all expenditures with defined goals for that budget year. If an agency programs all its money at the outset of a budget cycle, it cannot move in a new direction when the promise of that direction is revealed. A representative of the Environmental Protection Agency (EPA) described his agencyâs struggle to conform to GPRA. Some of the basic research supported by the agency, he said, does not 139
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH easily align with goals expressed in the budget, because the results are unknowable or because the agency cannot show results within the budget year. A representative of the National Aeronautics and Space Administration (NASA) said that neither the Office of Management and Budget nor some of his own agency people seemed to appreci- ate the differences between NASAâs basic research and other ways that the NASA mission is implemented. âAll of what we do is not the same,â she said, âbut they expect that our plans and reports are going to look the same.â A representative of DOE noted that as a manager he was not in a position to directly judge the science supported by the agency. âWe are science-managing, not science-performing. There is a tenuous link between what we do in the office and the actual performance of science at a laboratory or university.â He said that science is performed according to its own standards of integrity and that the best science-management practices are those that will not have an adverse effect on science itself. Several scientists expressed surprise that they had to explain the process of basic research each time new members came to Congress. They suggested that agencies coordinate presentations to communicate about this issue with oversight bodies better. They also noted that once oversight groups recognized the basic prin- ciples of science, they might understand better why micromanaging of agency research programs does not necessarily lead to better science. In its first report on GPRA, COSEPUP recommended the use of the following criteria to evaluate research programs: the quality of the research, the relevance of the research to the agencyâs mission, and leadershipâthat is, the level of the work being per- formed compared with the level of the finest research done in the 140
Workshop Summary same field anywhere in the world. The criteria were discussed frequently throughout the workshop. Agencies have all incorporated excellence into their evaluations. Individual research projects are evaluated for their quality by panels of independent experts in the same field of research. At a higher level, research programs are usually evaluated by panels of independent experts. However, there is considerable variation among agencies in how quality is assessed. The entire mission of the National Science Foundation (NSF) is research; a large, traditional structure of volunteer peer reviewers spends large amounts of time and effort in reviewing grant applications and grant renewals. In DOD, a very small proportion (1.5%) of the overall budget is dedicated to research, and this portion is peer-reviewed in the same manner as NSF-funded research by expert peers. Outcomes of most of DODâs work, however, which includes extensive programs of weapons testing, are more predictable, and the research component of the work is small. Instead of traditional peer review, the agency more often evaluates such work by marking its progress against estab- lished benchmarks. Because of such variations, the panel acknowl- edged that there are multiple approaches for gathering information on quality and for setting priorities and that agencies should be free to design their own approaches. There was also acknowledgment of how much work is asked of the science and engineering communities in serving on review panels. Both NSF and DOD, in particular, as well as the National Institutes of Health (NIH), are sometimes accused of reviewing too much and of overtaxing their reviewers. Panelists concluded that agencies generally have methods for gauging the relevance of their research to their missions. These 141
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH methods, however, vary widely among agencies and are seldom made clear in GPRA plans or reports. Two views of relevance were discussed. The first is relevance as perceived by agency managers, who must decide what kinds of research are relevant to their mission objectives. The second is the view of the âusersâ of research. For example, the users of results of NIH research include pharmaceutical companies, hospital administrators, applied researchers, and doctors. NIH was asked whether it heard from such users, and a representative responded that the agency hears from them through its national advisory councils, which include scientists, health-care providers, and members of the public. The agency also holds workshops to gather general feedback. The example of the Army Research Laboratory (ARL) was discussed. ARL uses both external peer groups and user groups to evaluate its research. Peer committees are specifically designed to evaluate quality, but they are less well equipped to evaluate relevance. For that, expert committees must be augmented by members of the user community or experts in related fields. Relevance was conceded to be easier to assess for entities like ARL, where researchers work closely with those who will use the outcomes of research. It might be more difficult to describe for NSF and NIH, where most research is performed externally, users might be unknown, and most research is basic research. A panelist remarked that in assessing DOE research, users do have input, but it is seldom revealed in plans or reports. International benchmarking is the use of expert panels that include reviewers from both the United States and other countries to evaluate the leadership status of a country in a given research field. The goal of international benchmarking is to judge the âleadership levelâ of a program with respect to the world standard 142
Workshop Summary of research in that field. COSEPUP had written earlier that for the sake of the nationâs well-being, the United States should be among the leaders in all major fields of science and be preeminent in selected fields of national importance.3 It was agreed that the agencies that focus on basic research, notably NSF and NIH, address the issue of leadership at least tacitly by funding their researchers competitively. By funding the best researchers, they are supporting the careers of the best scien- tists. But the leadership issue might be addressed more explicitly by including more foreign researchers on review panels. The discussants encouraged agencies to experiment with ways to increase the international perspective in their evaluation proce- dures. One panelist cited an earlier COSEPUP experiment with international benchmarking, in which the United States was deemed to be the overall world leader in materials science and engineering, but the study revealed some fields in which the United States was not ahead. âIf I were sitting in an agencyâ, said the panelist, âthat would worry me. The only way to get at this picture is through an international viewpoint.â DOE and several other agencies mentioned plans to experiment with international benchmarking. One agency represen- tative cautioned that setting up such a program might take more time than is allowed in the framework of GPRA. The panel agreed that every agency that supports research has an interest in enhancing the education of graduate students, postdoctoral scientists, and active scientists. At the workshop, 3The rationale for these complementary goals is that the nation must be performing research at the forefront of a field if it is to understand, appropriate, and capitalize on current advances in that field, no matter where in the world they occur. Cite Goals report. 143
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH however, agency representatives seldom mentioned education in their presentations. The first GPRA report explicitly recommended the use of education as an evaluation criterion for purposes of GPRA compliance, and the panel reiterated this recommendation. Specifically, the panel recommended that the expansion or contrac- tion of programs be assessed for effect on present and future workforce needs. Agencies support hundreds or thousands of individual research projects and they cannot evaluate all of them for the purpose of GPRA compliance. Therefore, they aggregate their projects to a large extent. Some agencies, such as the DOD, aggregate up to the program level; others, such as NSF, aggregate virtually all their projects into a single âresearch portfolio.â The topic of aggregation provoked extensive discussion, largely because a very high level of aggregation prevents insight into the evaluation of specific programs or divisions within pro- grams. One criticism of high aggregation is that it is opaque to oversight bodies and others who want to understand how an agency makes decisions. An opposing view was that it is not appropriate for oversight bodies to âmicromanageâ agenciesâ selection and evalua- tion of individual programs or projects. A workshop participant noted that some committees do not use GPRA documents when research activities are too highly aggregated. She noted that the law requires research activities to be described at the program and financing budget levels. In NSF GPRA documents, she said, the existence of specific programs or disciplines is not apparent, and it is not possible to weigh activities against goals. âCongress would like to know what you were trying to do. It would like clearer statements of objectives and accomplishments.â Other participants indicated that it was risky to try and that agencies are often caught between conflicting desires, because 144
Workshop Summary some committees want to see a high degree of detail and others do not. A number of participants indicated that the degree of aggregation should be left up to the individual agencies. It was suggested, for example, that aggregation was easier in a mission agency, such as DOD, because its programs are more focused on specific goals, whereas NSF supports virtually all forms of research, which might not have predictable goals. Some also said that when agencies choose a high level of aggregation they should also make clear how decisions are made below that level and provide access to materials that demonstrate the decision-making. Even though not all committees will want to read through the long, highly detailed documents used by agencies for internal planning, these documents should be available. Another point made was that there has always been a tension between Congress and the scientific community about what kinds of research to pursue. In some cases, Congress would like to micromanage an agencyâs portfolio to pursue political or other nonscientific goals. In such cases, it is understandable that agencies prefer to shield their activities from decisions that can alter their program plans. Another issue addressed at the workshop was the concept that âone size does not fit all.â One of the most striking examples of difference can be seen in the research supported by two agencies: NSF and DOD. Nearly all NSFâs budget is spent on research, but only about 1.5% of DODâs budget is. And yet each has to respond to the same GPRA requirements, even though the overall DOD GPRA plan barely has space to mention research at all, let alone deliver a detailed analysis of planning and evaluation methods. An NSF representative said, âOur number one principle is to do no harm. One size doesnât fit all. If the shoe doesnât fit, it isnât your shoe.â 145
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH A NASA representative explained the differences particu- lar to her agency. At NASA, research activities are integrated across so-called enterprises, which are the major agency divisions. Each enterprise has a portion of a kind of research, and that portion must be integrated with the other activities of the enterprise, such as building hardware and planning space missions. It is difficult to explain the different qualities of scientific research within a GPRA document that comprehends an entire enterprise with all its diverse activities and goals. Another agency representative expanded on the difficulty of the large mission agencies. Because most of their activities are not research, the agencies themselves might not emphasize or even understand the research process. One representative pointed out that GPRA reporting in his agency is done through the chief financial officer. The panel asked many questions about the utility of GPRA for agencies: What benefits, if any, does it bring to your agency? Some agencies saw benefit in being forced to examine management procedures more closely and to think in more detail about how their research activities served the objectives described in their budgets. Other agencies were still struggling to make sense of the GPRA requirements and to fit them to their agencyâs structure and function. For example, EPA expressed a âlot of dilemmas.â It felt a split between its overall mission and many of the science programs that supported that mission. Some of the programs supported basic research and could not be described annually in terms of out- comesâand yet both the oversight groups and agency administra- tors asked for such outcomes. Similarly, the US Department of Agriculture (USDA) described itself as âvery mission-drivenâ but having core agencies that perform research. The representative felt 146
Workshop Summary that the most useful way to use GPRA was to apply it to program management, not to the research itself. DOE expressed the most profound difficulties. A represen- tative said, âWeâre required by different people to meet different requirements not of our choosing. Weâre trying to come to grips with GPRA by focusing on budgetâadopting a planning process that allows us to embed performance goals in the budgeting process that makes sense from the GPRA point of view. Now we have gotten instructions from appropriators to strip out all high-level goals and instead to use performance measures in line items as statement of what weâre trying to accomplishâ$2-3 million items, very specific. This is a big problem.â DOD said that GPRA has not added value to its evaluation process, because the agency is using the same procedures that it did before GPRA. It still evaluates the quality and relevance of research with a GPRA-like process. Some participants indicated that agencies do not appreciate the flexibility built into GPRA. That is, the law permits agencies to devise âalternative formsâ of planning and evaluation when annual quantitative techniques are not appropriate. But some agencies that are expressing the most difficulty have not fully done so. The law does not allow agencies to hire additional staff or consultants to comply with GPRA, and its intent is not to impose an additional workload. But agency representatives described a considerable amount of extra workload in the form of meetings, workshops, and other activities. One representative offered an unofficial estimate that one-fourth to one-third of the time of some middle- and high-level officials was devoted to GPRA compliance. Some workshop participants also expressed concern over the amount of time devoted to GPRA. They felt strongly that it should not replace or interfere with how agencies do their strategic 147
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH planning. But they were optimistic that once agencies moved farther along the learning curve, they would be able to integrate GPRA reporting procedures with internal agency procedures in ways that benefit both but do not require additional time. An NIH representative said that NIH had not had to change its internal planning or reporting procedures but felt that GPRA required special attention. She noted that GPRA work takes place in the context of other activities: planning, priority-setting, and producing other documents for 23 institutes and centers. For NIH as a whole, there are 55 strategic targets, only five of which had been discussed at the workshop. The restâincluding training facilities, administration, grants, technology transfer, and priority- settingâare equally important parts of GPRA. An NSF representative said that GPRA âis expensive for our agency.â He said that the CFO, chief information officer, and many others all meet weekly to talk about it. The agency had to develop data systems to accumulate information for GPRA. It also affects the committees of visitors (COVs) that review NSF pro- grams. The COVs used to study only the process of making awards and ensure that it was fair and honest. Since passage of GPRA, the agency has expanded the mission of COVs to evaluate the research results of past investments. DOD indicated that it had been able to integrate GPRA into its processes. The panelists noted that the extra effort would probably decrease as agencies developed systems that responded to GPRA more easily. They also urged oversight bodies to help agencies to develop reporting formats that minimize the extra effort required. Most agency representatives reported difficulties in complying with the timing of GPRA requirements. They are 148
Workshop Summary required to send in their performance plans and performance reports with the annual budget. However, because budgets are due at the beginning of each year, sending in an annual performance plan with the budget requires preparation of the report before the year has ended. NSF explained its difficulties this way: âIf you havenât written your performance report for 2000, how do you write your performance plan for 2003? Itâs an issue of how often we have these reports. The law has an artificial timeline that doesnât fit any of us. The performance plan could extend over a longer period than a year. It canât possibly hit what youâre doing in the next budget cycle. And we canât factor what weâve learned in that report into the next cycle.â A representative of the General Accounting Office (GAO) suggested that it would be hard to change the requirement for annual reporting but that the agencies can specify what they are reporting annually by using an alternative reporting form. She said that GPRA is more flexible than agencies recognize. Like the focus groups, this workshop featured extended discussions on the difficulty of evaluating the results of basic research each year. Such a requirement, several participants said, puts unrealistic pressure on a principal investigator to come up with the ânext great discovery of the last 12 months.â One participant noted that the âoutputâ of good research is original knowledge, as measured by publications and perhaps new products, but that the âoutcomeâ of that knowledge might be unknown for years. As a result, DOD now looks at every research program not annually, but every 2 years. Review panels are asked whether adequate progress is being made toward stated goals. NSF is planning to evaluate every basic-research program every 3 years, covering one-third of its portfolio every year. Thus, it is reporting on its programs every year, but reviewing a given 149
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH program every 3 years. If there is an exciting discovery from a grant made 10-years previously, the discovery will be included after the year when it occurs. A USDA representative pointed out that an evaluation is more effective when it comprehends several years: âyou can tell your story better.â But she said that evaluations can motivate people at the bench if they know that their results are expected and might be used by someone. Representatives of oversight bodies said that they would like more information about how agencies verify and validate their procedures for evaluation of research. For example, when an agency uses expert review to evaluate a program, who are the reviewers? How are they recruited? Are they all outside the agency? If not, when is the use of an internal panel justified? Are the users of research included in review panels? What qualifications are re- quired? How are conflicts of interest avoided? What process do reviewers use? How good is the quality of the data that they are given? How much quantitative information is included? The absence of such information from most GPRA reports leads to some suspicion about the objectivity of expert review and the independence of reviewers. Panelists agreed that although most oversight bodies do not want to review the curricula vitae of all an agencyâs reviewers, for example, they feel reassured if that material is at least described in reports and made available as necessary. It might be reasonable for a committee to undertake a sampling of a given agencyâs procedures, for example, to obtain a better under- standing of the evaluation process. This topic has not been thor- oughly discussed between agencies and oversight bodies, however. Some participants indicated that the agencies themselves are best qualified to organize and validate their reporting proce- dures and that most of them have systems in place for doing so. The issue is not whether they should turn over the procedures or 150
Workshop Summary their validation to an outside body. They should simply be willing to describe the procedures in more detail than they do to provide a âsense of comfortâ to the oversight bodies, which are primarily looking for an understanding of why agencies use the methods they use. âItâs not a matter of right or wrong,â said one oversight official. âItâs just for us to understand what they did and why they did it that way.â Some agencies complained that oversight bodies issued conflicting requests, lacked consistency among personnel, failed to issue explicit guidelines, disliked new systems that were designed to comply with GPRA, and made unreasonable requests with regard to research activities, especially in large mission agencies. Some oversight personnel complained that agencies did not explain the special needs of science adequately, did not reveal their specific planning and reporting mechanisms with sufficient trans- parency, did not adequately align the program activities with budget line items, and did not explain their validation and verifica- tion procedures for evaluating research programs. One agency representative reported that he had never been contacted by a representative of Congress about GPRA. Some agency personnel were confused about why the appropriations committees did not seem to use or take an interest in agency GPRA reports. A congressional representative suggested that the level of aggregation of program activities was so high that committees could not understand or see the actual program activities. A GAO representative stated, âWe know what agencies are doing, but is it good, bad, or indifferent? What has worked, and what are the problems? Has it been a successful experiment?â An EPA representative said that his agency had combined the budget with the performance plan, as suggested by GPRA, but that the âappropriators were upset that they werenât seeing what they were used to seeingâ and asked for the old system. Other 151
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH agency representatives reported the same difficulty with commit- tees and sometimes with their own agencies. One agency representative acknowledged that the process was still relatively young, and participants were still learning what the others wanted. âThree years ago,â he said, âeveryone was in denial that we were going to have to do anything with this.â 152
Workshop Summary Panel Members: President Associate Vice Chancellor of Engineering The Burroughs Wellcome Fund and Director, Texas Transportation Research Triangle Park, North Carolina Institute The Texas A&M University System College Station, Texas Director Emeritus Argonne National Laboratory Argonne, Illinois Professor of Entomology Department of Entomology Texas A&M University Professor Emeritus in Nutrition College Station, Texas School of Aquatic and Fishery Sciences University of Washington Seattle, Washington Retired Vice Chairman and Chief Financial Officer, AT&T Short Hills, New Jersey Director, Geophysical Laboratory Carnegie Institution of Washington Washington, D.C. Professor of Environmental and Occupational Medicine Howard University Distinguished Member of the Technical Washington, D.C. Staff Bell Laboratories, Lucent Technologies Murray Hill, New Jersey University Professor and Director Data Storage Systems Center Carnegie Mellon University Arthur Amos Noyes Professor of Chemistry Pittsburgh, Pennsylvania California Institute of Technology Pasadena, California [by confcall] Frank P. Hixon Distinguished Service Professor James Franck Institute Department of Chemistry The University of Chicago Chicago, Illinois 153
IMPLEMENTING THE GOVERNMENT PERFORMANCE AND RESULTS ACT FOR RESEARCH Participants: Office of Science Policy US Department of Agriculture Office of the Director Washington, D.C. National Institutes of Health Bethesda, Maryland Policy Analyst National Institute of Standards and Performance Measurement Specialist Technology US Environmental Protection Agency Gaithersburg, Maryland Washington, D.C. Assistant Director Assistant to the NASA Chief Scientist US General Accounting Office NASA Headquarters Washington, D.C. Washington, D.C. Director, Office of Integrated Activities Program Analyst The National Science Foundation Office of the Deputy Under Secretary of Arlington, Virginia Defense (S&T) Washington, D.C. Deputy Under Secretary of Defense (S&T) US Department of Defense Chief, Planning, Evaluation and Legislation Washington, D.C. Branch of National Institute of Dental and Craniofacial Research, NIH Director, Division of Evaluation, Office of Bethesda, Maryland Science Policy National Institutes of Health Director, Office of Science Policy & Rockville, Maryland Planning National Institutes of Health Director of the Office of Planning and Bethesda, Maryland Analysis US Department of Energy Specialist, Science and Technology Policy Washington, D.C. Congressional Research Service Library of Congress Washington, D.C. 154