This chapter examines the methods developed so far by agencies to evaluate their research programs, some of the difficulties encountered, and some features of the interactions between agencies and their oversight bodies 1 in both the legislative and executive branches. The observations here are based on the conversations of the panel with agency and oversight staff as well as with members of agency expert review panels in its focus groups and workshop. At the focus groups, the five agencies examined—DOD, DOE, NSF, NIH, and NASA—were asked to respond to the following questions regarding their methodology:
What methodology is used for evaluating research programs under GPRA?
What level of unit is the focus of the evaluation?
Who does the evaluation of the research program under GPRA (e.g., advisory committee, staff, combination)?
What criteria are used for the evaluation?
How does the selection and evaluation of projects relate to the evaluation of the research program?
1The oversight bodies with primary responsibility for assisting agencies and evaluating their efforts to comply with GPRA are Congress and its General Accounting Office (GAO), with input from the Congressional Research Service (CRS), and the White House Office of Management and Budget (OMB), with input from the Office of Science and Technology Policy (OSTP).
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 17
Page 17 CHAPTER 2 AGENCY METHODS This chapter examines the methods developed so far by agencies to evaluate their research programs, some of the difficulties encountered, and some features of the interactions between agencies and their oversight bodies 1 in both the legislative and executive branches. The observations here are based on the conversations of the panel with agency and oversight staff as well as with members of agency expert review panels in its focus groups and workshop. At the focus groups, the five agencies examined—DOD, DOE, NSF, NIH, and NASA—were asked to respond to the following questions regarding their methodology: What methodology is used for evaluating research programs under GPRA? What level of unit is the focus of the evaluation? Who does the evaluation of the research program under GPRA (e.g., advisory committee, staff, combination)? What criteria are used for the evaluation? How does the selection and evaluation of projects relate to the evaluation of the research program? 1The oversight bodies with primary responsibility for assisting agencies and evaluating their efforts to comply with GPRA are Congress and its General Accounting Office (GAO), with input from the Congressional Research Service (CRS), and the White House Office of Management and Budget (OMB), with input from the Office of Science and Technology Policy (OSTP).
OCR for page 17
Page 18 It was apparent to the committee that all the agencies interviewed have made good-faith efforts to comply with the requirements of GPRA. During the focus groups, they described in detail the evolution of their approaches, their frequent midcourse corrections, and their expenses in time and effort. The act forbade the use of outside consultants or additional hiring to design and execute GPRA responses, and for most agency officials the demands of GPRA produced an increased workload that promises to continue for some time. (For more details, see the agencies' responses, summarized in Appendix C.) 2.1 Expert Review All agencies use expert review panels to evaluate their research programs. However, in response to GPRA, each of the agencies addresses the issue of expert review in a different way. Further, while some have well-established procedures that they are just refining, others are still at the very early stages of development. Both NSF and NIH use advisory groups who produce evaluations via an alternative format approved by OMB. Using this method, there is no attempt to quantify a goal or the degree to which it has been met. Rather, goals are successfully met or substantially exceeded in NIH's case or successful or minimally effective in NSF's case, as determined by a single (in NIH's case) or multiple (in NSF's case) expert review panels. At NIH, a single overarching panel evaluates all NIH's research programs at one time. At NSF, numerous committees of visitors review individual research programs on a rolling 3-year basis. The results of those evaluations are then provided to several advisory committees whose membership represent several disciplines. DOD uses a process called Technology Area Reviews and Assessments (TARA) to evaluate science and technology programs through expert peer reviews. In the DOD process, basic research is not isolated from applied research and advanced technology
OCR for page 17
Page 19 development. All three categories—6.1 (basic research), 6.2 (applied research), and 6.3 (advanced development)—are evaluated as overlapping parts of the technology area under review, with clear links to what discoveries are expected. In the case of both NASA and DOE, both generally conduct extensive peer review of their projects and programs using external advisory committees. However, each has faced difficulties in translating their existing activities into the GRPA process in terms of setting an appropriate level of unit for evaluation and for finding relevant performance measures. Some programs (called enterprises at NASA), such as that in Basic Energy Sciences at DOE and at NASA, have had more success than others within the agency. Both these agencies are undergoing major redesign efforts in how they respond to GPRA for their research programs. In the case of all agencies, staff and advisory committee members expressed concern that GPRA-related activities diverted advisory committee members from their original activities or added new activities. Furthermore, agency representatives expressed concerns that the balance of the existing membership might need to be modified. Although expert review panels often include members who are expert in fields “adjacent” to the field under review, GPRA documents reviewed by the panel did not clearly identify reviewers who were representatives of “potential user communities” or, where appropriate, the public. Nor do documents show how agencies attempt to determine the extent of their particular “potential user universe.” Because agency research is supported wholly by public funds, it is appropriate during the review process to consider how the interests of users are served. In addition, explicit statements about the reasons for pursuing particular fields or programs could help agencies to focus on their most productive initiatives and avoid wasting resources on those with low potential. Further, international members from outside the United States would help respond to questions of international leadership.
OCR for page 17
Page 20 Recommendation M-1 Agencies should continue to take advantage of their existing expert review panels, but should review the balance in their membership, particularly the need to include user groups, and the time panel members devote to GPRA versus other topics so that it is not excessive. In addition, they should review the degrees to which internal and external reviewers are used. 2.2 Evaluation Criteria This section summarizes the degree to which the agencies are using COSEPUP's proposed criteria (quality, relevance, leadership) to evaluate their research programs. 2.2.1 Quality. According to the agencies themselves, quality is the most widely and traditionally used of the three criteria. By custom, the quality of research is evaluated by peer-review committees that include members both inside and outside the program under review. In rare cases, agencies use internal reviewers or program monitors in place of external reviewers. That practice might be deemed necessary when those best qualified to perform evaluations work in the same agency, although perhaps in a different division. In such cases, the independence, rather than the external position, of the reviewer(s) is judged to be a validating factor, and the degree of independence is confirmed by agency administrators. For example, a program monitor performs evaluations of the electrochemistry program in the Office of Naval Research by meeting annually with grantees; the officer is uniquely familiar with the details of the research. In other cases, security or other considerations might dictate a need for internal review. In general, however, review by outside experts is preferred.
OCR for page 17
Page 21 Recommendation M-2 Agencies should continue to use peer review to evaluate the quality of their research programs. 2.2.2 Relevance. The agencies' use of the second performance criterion, relevance, is somewhat less apparent. The panel found that agencies recognize the importance of relevance in planning and review and that they consider the degree to which research programs and projects support their missions. However, although the use of relevance as an evaluation criterion is commonly embedded as an implicit element of planning and reviewing, it might not appear as an explicit element of published GPRA performance plans or reviews. In addition, according to statements by agency officials, relevance appears to be evaluated at different stages by different people, most often by administrators who judge by custom or instinct whether a given line of research is relevant to a mission. Agencies' methods of performance review might therefore not be sufficiently clear to oversight groups and the public. Recommendation M-3 Agencies should clarify their use of relevance as a criterion in evaluating their research programs. User groups should be part of the relevance evaluation process, and their role should be described clearly in performance plans and reports. Although agencies commonly use the criterion of relevance in implicit fashion, it should be made more visible to user groups, oversight bodies, and the public. Clear judgments about relevance can help agencies establish priorities among competing programs of equal scientific interest. 2.2.3 Leadership. COSEPUP indicated in its Goals report that US-supported research programs should be at least “among the leaders” in all major fields and that international
OCR for page 17
Page 22 benchmarking can provide a reasonably quick and inexpensive method of assessing the nation's leadership level. In general, however, agencies have not used the method of international benchmarking to evaluate the leadership level of research programs against world standards. Most agencies are aware of the testing of international benchmarking by COSEPUP, and several agencies are considering its use. One impediment is that implementation would require additional time and resources. Agencies have used various other measures of leadership—such as international prizes, patents, national awards, and the judgment of experts—but not in a broad or standardized way. In keeping with their diversity, agencies should devise their own approaches to evaluating leadership. They must first decide, for example, whether a particular field is one in which this country should be preeminent or simply among the leaders. They might also benefit from the use of existing expert review panels or other methods to evaluate leadership and from including international members as appropriate. Recommendation M-4 Agencies should use international benchmarking to evaluate the leadership level of research programs, as described in COSEPUP'S earlier Goals and International Benchmarking reports, especially for emerging fields of research and those of national importance. Agencies should select the fields to be evaluated and devise their own methods. If an agency does not evaluate a particular program with the criterion of leadership status, it should explain the reason for supporting the program (for example, a given program might have value for training or for filling gaps in knowledge important to the agency's mission).
OCR for page 17
Page 23 2.3 Human Resources Agencies justifiably attach great importance to their role in promoting the development of human resources. Their research programs depend on a continuing flow of talented scientists and engineers, who are best educated in the context of the research supported by agencies and other funders. However, this objective might not receive explicit emphasis or visibility in GPRA plans and reports. Because the objective of developing human resources is generally not a clear or prominent feature of performance plans or reports, there is a risk of overlooking its continuing and fundamental importance especially in relation to the scientific and engineering research that is supported at universities. This objective must be explicit not only because it affirms the value of educating scientists and engineers by including them in the research programs of their advisers, but also because it demonstrates how reductions in research funding in specific fields can jeopardize the preparation of the next generation of scientists and engineers who will be important to the nation's future. Recommendation M-5 The development of human resources should be emphasized as an explicit objective of GPRA performance plans and reviews. Plans to increase or reduce budgets should be described in terms of their impact on the future science and engineering workforce. 2.4 Aggregation One aspect of GPRA that requires closer consultation between agencies and oversight groups is the clause that permits agencies to “aggregate, disaggregate, or consolidate program activities” in formulating GPRA plans and reports. Some difficulties appear to arise because of the different importance of research to
OCR for page 17
Page 24 various agencies. The portion of the budget allocated to research activities ranges from a small fraction (as in DOD and DOE) to most of or nearly all an agency's budget (as in NSF and NIH). Accordingly, agencies vary widely in the degree to which they have chosen to aggregate research programs for GPRA reporting. Some agencies report on individual programs; others describe entire research fields on an agency-wide basis. A concern was voiced by representatives of several agencies in which research is a minor portion of the overall mission portfolio. The research divisions of such agencies might find it difficult to distinguish themselves from the dominant mission activities. Because the dominant activities tend to be easier to express in terms of predictable targets and quantifiable progress (for example, constructing a building, setting up a new social service, or planning a space launch), the agency's GPRA performance plans and reports are expressed primarily in terms of quantitative goals and milestones. Research programs in these agencies might find themselves compelled to conform to a prescribed reporting format. When the degree of aggregation is high, oversight bodies, potential users, and the public might not be able to see or understand the detailed layers of decision-making and management that underlie the GPRA descriptions. In some instances, concerns were expressed by those advising oversight bodies that a high level of aggregation makes the underlying processes less clear. Agencies indicated that they choose a high degree of aggregation because individual program activities are not easily linked to budgetary line items or because specific mechanisms of decision are too numerous to discern at the high aggregation level of GPRA reporting. Because a primary purpose of GPRA is to permit oversight bodies to understand how agencies make decisions and set priorities, it is essential that these bodies be able to see the connections between performance plans, performance reports, and strategic plans.
OCR for page 17
Page 25 Recommendation M-6 Agencies that choose to aggregate their research-program activities at a high level should endeavor to make clear the decision-making processes that lie below this level. A degree of transparency is needed for oversight bodies and the public to understand how an agency evaluates its programs and sets priorities. Although oversight bodies cannot review the thousands of subentities that perform their own planning and reviewing within agencies, they can reasonably expect access to documents that help them answer specific questions. 2.5 Validation and Verification Although expert review has long been the accepted method of evaluating research in science and engineering, some aspects of the implementation of expert review are unclear to outside observers. Oversight bodies and some agencies express a need for clearer validation and verification of expert review, such as explication of how agencies establish the independence of reviewers. For their part, agencies do not customarily communicate the details of how they validate or verify their evaluation procedures in ways that are clear to oversight groups or users. Validation is particularly of concern when the level of aggregation is high (that is, where research fields are evaluated as a single program on an agency-wide basis). The process of expert review is implicitly understood by those involved in research, because agencies consider expert review to be the most objective and reliable mechanism for evaluating their research programs. However, the mechanism must be explicitly and publicly articulated to those who are charged with oversight and who might be less familiar with or accepting of the custom. Recommendation M-7 Agencies should devise ways to describe how they validate their research-evaluation methods, describ-
OCR for page 17
Page 26 ing, for example, how they select expert reviewers and choose to aggregate research programs for review. 2.6 Summary The agencies examined have devoted considerable effort to developing reporting procedures that comply with the requirements of GPRA and are congruent with their internal planning procedures. However, some expressed a need for new processes. It was clear from the panel's discussions with agencies that compliance methods are still very much “works in progress” and that further work is needed if agencies are to both fulfill the intent of the law and provide benefits to the agencies. Testimony during the focus groups indicated that the three criteria (especially quality and relevance), as described by COSEPUP, had proved useful in approaching the requirements of GPRA. In particular, the panel was able to verify the usefulness of the criteria to the agencies themselves. The panel concluded that the criteria of quality, relevance, and leadership are more effective than quantitative performance indicators for evaluating research programs.