Read "NIH Extramural Center Programs: Criteria for Initiation and Evaluation" at NAP.edu

Page 106 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

5
Evaluation of Center Programs

The National Institutes of Health (NIH), like other federal research agencies, takes several approaches to evaluating the performance of its programs. One is technical review by a panel of external experts knowledgeable in the area of research involved and perhaps users of the research results. Another approach is formal evaluation based on data collected by external contractors.

As described in Chapter 3, a proposal to establish a program of research centers in the first place undergoes a prospective review process, which varies from institute to institute but generally involves an external committee or workshop to obtain input on the goals and design of the proposed centers as well as the required review and approval by a chartered external advisory body (usually the institute’s national advisory council) and clearance of the Request for Application (RFA) or Program Announcement (PA) through the Office of Extramural Programs in the Office of the Director of NIH.

Research programs may also undergo retrospective evaluation. Each ongoing as well as proposed center program must justify itself in the annual planning and budgeting process. In addition, some of the institutes engage in a formal “visiting committee” process, that is, an external panel of experts, usually a subcommittee of the institute’s national advisory committee for a major program division that reviews the division’s programs on a regular schedule. The National Institute on Aging’s national advisory council, for example, reviews three of the institute’s six major program units each year, one at each of its first three meetings of the year. This process can

Page 107 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

lead to changes in center programs or proposals to initiate new center programs.

In one case, the institute has a time limit, or “sunset” provision, when a major external review must be conducted to determine if the center program should be continued. As noted in Chapter 3, the National Heart, Lung, and Blood Institute (NHLBI) has a 10-year limit on each of its Specialized Centers of Research/Specialized Centers of Clinically Oriented Research programs (each of which supports several centers for research on a specific disease), at which time they are replaced by a new program on another disease, unless a committee of outside experts determines that there are “extraordinarily compelling reasons” to continue the program (Lenfant, 2002). Some other institutes have 10-year limits (e.g., two 5-year awards) on individual centers, but not on the overall program.

From time to time, institute staff members, the institute director, or the national advisory council decides that a center program should be reviewed by staff, an external committee of experts, or a combination of staff and outside experts for its continuing effectiveness and/or relevance. Examples of reports from such ad hoc exercises are summarized in Appendix E. Generally the reports are based on the experience and expert judgment of committee members, because for reasons discussed below, objective measurement and analysis of a program’s performance, especially in terms of outcomes and impact, are difficult to perform and frequently require resources and technical skills beyond that provided to the committees. More sophisticated evaluation designs, such as those involving comparison groups, are generally viewed as even more difficult to perform and, therefore, are seldom employed in practice.

These exercises are also basically what are called formative or process evaluations (which assess the ongoing program process to identify modifications and improvements) rather than summative or impact evaluations. In one case, the Population Research Center Program, the program staff at the National Institute of Child Health and Human Development (NICHD) considered undertaking an impact evaluation. They obtained funding through the Office of Evaluation in the Office of the Director of NIH, worked closely with the program evaluation staff in the office of the NICHD director, compared notes with center program staff in other institutes, and consulted with evaluation experts in academia. In the end, they decided on a formative evaluation, because the field is small and therefore selecting centers because they had top researchers (and attracted more top researchers after they obtained the center grant) means that there is no comparison group. Instead, the evaluation group consulted with a wide range of people—in funded centers, potential centers, universities without centers, and other funding organizations—and identified trends and made conclusions in the context of the best strategy for the population research program

Page 108 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

(which includes research project grants, training awards, and contracts for large-scale surveys as well as the centers).

The Population Research Center Program evaluation led to substantial changes in the program. It also led to formation of an NIH interest group on evaluation of centers, consisting of evaluation officers of many of the institutes, who meet periodically to discuss how to evaluate center programs. This effort may eventually lead to more formalized evaluation criteria and procedures, but was still in the early stages of development when this report was written. This chapter will therefore describe several attempts at evaluation that have been carried out by individual institutes, detail some of the obstacles to good evaluations, point out some lessons learned by other agencies in their efforts to conduct similar assessments, and most importantly, propose some general principles and specific measures that ought to be incorporated into future center programs to make evaluation easier and more effective.

PREVIOUS EVALUATIONS OF CENTER PROGRAMS

As noted above, some NIH institutes have conducted evaluations of one or more of their center programs. The committee reviewed 11 such evaluations (see Box 5-1), summaries of which are provided in Appendix F. It is unlikely that these are the only ones that have been carried out, but these 11 were readily available on institute websites and share enough common features that the committee considers them representative.¹ All were conducted, or commissioned by, the institute’s national advisory council, which has statutory responsibility for reviewing and approving all research awards made by the institute. The circumstances that generated the evaluations of center programs by institutes varied, but all were ad hoc efforts. That is, they were not a result of a regular preplanned process for periodic evaluation of the center program in question. Many were apparently initiated in response to a perception on the part of the institute direc-

¹

In some cases, evaluations leading to major program changes were not available on the institute website at the time this report was written, for example, the May 2000 report of the Midstream Evaluation Committee for the Comprehensive Sickle Cell Centers cited in the RFA for Comprehensive Sickle Cell Centers released in December 2000 (RFA-HL-01-015) and the February 2003 expert panel review of the Botanical Research Centers Program cited in the most recent RFA for this program in December 2003 (RFA-OD-04-002), now posted at http://nccam.nih.gov/training/centers/bot-research-index.htm. It should also be noted that the National Center for Research Resources (NCRR) has conducted formal evaluations of some of its programs, including the Research Centers in Minority Institutions Program, which was evaluated in 2000 by an outside evaluation firm under the supervision of an expert advisory panel of extramural scientists (NCRR, 2000b).

Page 109 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

BOX 5-1
Some Previous NIH Evaluations of Specific Center Programs

National Cancer Institute

Institute of Medicine, 1989. A Stronger Cancer Centers Program: Report of a Study.
Cancer Centers Program Review Group, 1996. Report to the Director.
Ad Hoc P30/50 Working Group, 2003. Advancing Translational Cancer Research: A Vision of the Cancer Center and SPORE Programs of the Future.

National Institute of Arthritis and Musculoskeletal Diseases

Centers Working Group II, 1997. Summary Report to the Institute Director.

National Institute of Deafness and Communication Disorders

Work Group on Single and Multiple Project Grants, 1998. Report to the Director.

Office of Aids Research

Focus Group to Review the Centers for AIDS Research (CFAR) Program, 1999. Report to the Director.

National Institute of Child Health and Human Development

Demographic and Behavioral Sciences Branch, 1999. Report of the Demographic and Behavioral Sciences Branch Population Centers Review.

National Heart, Lung, and Blood Institute

Committee to Redefine the Specialized Centers of Research Programs, 2001. Report of the Committee to Redefine the Specialized Centers of Research Programs.

National Institute on Aging

National Institute on Aging, 2002. Report of the Alzheimer’s Disease Centers External Advisory Meeting.

National Institute of Nursing Research

National Advisory Council for Nursing Research, 2002. Minutes of the Advisory Council Meeting of May 21-22, 2002.

National Center for Complementary and Alternative Medicine

Research Centers Program Expert Panel, 2002. Research Centers Program Expert Panel Review.

Page 110 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

tor or staff, or one or more advisory council members, that the program was not accomplishing the goals originally set out for it or that developments in the field merited a review of the continued relevance of a long-standing program. Few of the reports contained data measuring outputs or impacts of centers (some recommended that the program staff collect data that could be used to evaluate the program in the future), and they relied heavily on testimony of center directors. In most cases the authors began with the assumption that there was a continuing need for a center program in some form. The relative paucity of external evaluations, given the number of center programs and center awards, reflects the difficulty in evaluating center programs. The following section describes some of the reasons for this.

CHALLENGES IN EVALUATION OF CENTER PROGRAMS

The program evaluations noted in Box 5-1 and summarized in Appendix F were all carried out by groups of highly reputable individuals, predominantly accomplished scientists, written up in a formal report, and posted on an institute website. Despite the shortcomings and limitations of the reports listed in the previous section, the committee commends the authors for taking on the task of evaluation at all. Given the resources and data available to them for the task, the reports probably represent the state of the art. The following sections suggest some reasons why this is the case.

Results Take a Long Time

The initial project period for most center grants is most often five years, a period that is long enough for competent scientists with good ideas to produce some papers for peer-reviewed publications. This is especially true in the case of core grants, which provide a richer infrastructure for scientists who already have their own research project grants. Centers will generally take a little longer to get organized and running than an individual laboratory, however, and center programs often start with only a few centers and build up the number of centers over several years, which makes it difficult to fully evaluate the program, as distinguished from individual centers, after the first five years. Also, in the case of disease-focused centers, the primary goal is to help move basic science discoveries into clinical research and practical applications. Clinical research is an increasingly highly regulated endeavor that depends heavily on the availability and cooperation of patient-subjects, which reduces the speed with which the research can be accomplished. Impact on health care takes far longer. In a recent paper, Balas and Boren (2000) calculated the time between publication of nine

Page 111 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

landmark trials of new clinical procedures and a utilization rate of 50 percent as 15.6 years.

Centers Are More than Center Grants

Center products, publications included, are often the result of a combination of activities at a center, only some of which are supported by the center grant. The most obvious case is that of the core support grants, which aim to facilitate independently funded research by center-affiliated investigators. Certainly a center whose affiliated scientists had no publications would be deemed a failure, as would a core center program whose centers could point to no publications after five years. It is more difficult to draw a conclusion about a core center whose investigators publish frequently or in high-prestige journals, or both. How much credit should go to the center? The individual investigators would very likely have published with or without the center, which was intended primarily as a cost-effective way to provide common resources to a group of investigators. The question, which is difficult to answer, is, did publications come faster than they would have without the center award or are they more interdisciplinary or translational than they would have been in the absence of a center?

Several center program evaluations cited the fact that the centers in question had succeeded in attracting additional research and infrastructure support from other institutes, agencies, foundations, and industry, as well as from their own parent institution. All those sources of support no doubt claim the very same publications and outcomes of center activities as results of their largesse.

Centers Do Several Things at Once

Specialized (P50) and comprehensive (P60) centers are often charged with far more than supporting a program of research at their institution. Centers of Excellence in Cancer Communications Research, for example, are expected to support three or more individual research projects that reflect hypothesis-driven research, plus pilot or developmental research projects, shared resources (cores), and career development. They are also expected to develop mechanisms for dissemination of research findings and products, and foster formal and informal intercenter collaborations. Cooperative Centers for Translational Research on Human Immunology and Biodefense are to have at least five components: (1) assay, reagent, and technology development; (2) three or more research projects; (3) core facilities to support research and manage the center; (4) short-term pilot projects; and (5) an education component focused on short-term training. Each center in such a program tends to be a unique combination of the desired

Page 112 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

elements. Trying to add them up to a result for the program tends to underemphasize what may be most important about each center. Evaluation then must be multifaceted as well and must include some decision rules about what to do about centers that do only some of their tasks, but do them extremely well, and centers that do all their tasks but none of them remarkably well.

Human Resources Are Hard to Track

A commonly cited benefit of centers is that they are fertile grounds for training researchers and thereby expand the pool of scientists working in the area of interest. Some programs require some educational or training component; others permit it; still others expect that it will take place even though no funding is provided for it. Like health impacts, however, results of this training take a long time to emerge. The trainees, whether predoctoral or postdoctoral, clinicians, junior faculty exploring a research career, or senior researchers switching the focus of their work, tend to move around in pursuing their careers, and following them for any length of time after they leave the center would require a substantial investment.

Much of the Value Added by Centers Is Intangible

Infrastructure is hard to measure. Providing services and resources used by many investigators on a centralized basis seems like an obvious way to increase efficiency, but demonstrating the savings or the increased productivity of the investigators is a daunting task. Research is by its nature unpredictable, advances often come in spurts, and the needs of investigators vary with the results of their experiments. As a result, simple before-and-after comparisons may be misleading.

Other benefits of centers can be even harder to measure. A very common reason given for starting center programs is a perceived need for multidisciplinary collaboration. Centers are seen as a way to attract established scientists from many disciplines to a common problem area and a common locale, where their increased interaction will promote the desired interdisciplinary studies. Measures of such interaction are conceivable but are likely to be time consuming and expensive. Collaborations can be indexed more easily, by looking at the authors of publications from the center, but it would be much harder to establish that these collaborations result from the existence of a center.

Even more difficult to measure is the oft-cited “synergy” that ideally makes a center more than the sum of its parts, the sense of teamwork that many feel emerges at a successful center, and the culture of collaboration that follows. Social scientists can operationalize and measure these con-

Page 113 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

cepts, but there are currently no standard, easy-to-use, or inexpensive instruments.

Peer Reviewers May Be Beneficiaries of the Program

Given the lack of data on outcomes and impacts, evaluation of center programs, like that of any research program, must rely substantially on expert judgment (National Academy of Sciences, National Academy of Engineering, Institute of Medicine, 2001). Ideally, the peers doing the review, whether it is a review of a proposal in a study section or an institute’s special emphasis panel, a site visit in connection with a grant renewal, or a program evaluation, should be among the leaders in the relevant field. This can be difficult to arrange in any of these instances, and evaluation of centers brings added difficulties. For example, established centers may loom so large in their field that it is difficult to find experts for external evaluation panels without conflicts of interest. The potential reviewers of the center program may all be leading scientists in individual centers or be affiliated with an institution that is the recipient of a center award. In the case of new center programs trying to implement a new research thrust (moving discoveries about disease etiology toward new diagnostics or treatments, for example), the most knowledgeable scientists in the field may themselves be applicants or strongly biased in favor of the older approach that has brought them success to date.

LESSONS FROM OTHER AGENGIES

The Government Performance and Results Act (GPRA) has produced heightened interest in assessing research in all its forms, not just at NIH but among all the federal agencies that fund research. Some general lessons on research program evaluation were provided by a National Academies report that analyzed how federal agencies that support science and engineering research were responding to GPRA (National Academy of Sciences, National Academy of Engineering, Institute of Medicine, 2001). The report panel examined the responses of the National Science Foundation (NSF), NIH, Department of Defense, Department of Energy, and National Aeronautics and Space Administration. Most of the report’s recommendations are GPRA-specific, but three are potentially applicable to any assessment of a research program. The panel recommended that (1) federal research programs, both basic and applied, be reviewed regularly; (2) the primary method of assessment be expert review for quality, relevance, and leadership; and (3) agencies work toward greater transparency and clear validation of methods.

Several agencies’ struggles with the problem of assessing research pro-

Page 114 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

grams antedate GPRA, however, including some efforts focused directly on centers. The efforts of NSF are particularly instructive.

National Science Foundation

NSF currently supports nearly 300 centers in a wide variety of center programs.² They fall into two broad areas: (1) centers focused on scientific problems too complex, too long-term, or too expensive for individual investigator grants, or that require cross-disciplinary collaboration, and (2) centers aimed at the transition of scientific and engineering research into usable solutions for national problems. NSF sees both types of centers as playing a key role in furthering the advancement of science and engineering in the United States, particularly through their encouragement of interdisciplinary research and the integration of research and education. The goals these centers share in common are similar to the goals for many of the NIH-funded centers:

To address scientific and engineering questions with a long-term, coordinated research effort by involving a number of scientists and engineers working together on fundamental research addressing the many facets of long-term complex problems;
To include a strong educational component that establishes a team-based, cross-disciplinary research and education culture to educate the nation’s next generation of scientists and engineers to be leaders in academe, industry, and government; and
To develop partnerships with industry that help to ensure that research and education are relevant to national needs and that knowledge migrates into innovations in the private sector.

In the 1980s NSF launched several large center programs—Science and Technology Centers (STCs) and Engineering Research Centers (ERCs)—that substantially increased its investment in centers (from 3 percent of the research budget in 1980 to 7 percent in 1990). In January 1992 NSF’s program evaluation staff convened a workshop to devise and sharpen methods for evaluating outcomes of research center programs. Four working groups were formed and asked to focus on outcomes and measures of impact in research, education, technology/knowledge transfer, and institutional impact, respectively. McCullough (1992) summarized the principal questions for measurement in each of the four areas as follows:

²	http://www.nsf.gov/bfa/bud/fy2003/ideas.htm.

Page 115 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

• Research

Do centers develop new perspectives that reflect the organized character and collaborations they encourage? (Are they actually studying distinctively different kinds of problems that are more complex, broader, or longer-term?)

Are problems formulated in novel ways; does research move in directions it otherwise could not have? (Do the centers fill a special niche in their research field?)

• Education

Do the “learners,” be they students, faculty, or industrial partners, acquire the insights and competencies necessary to perpetuate the scientific field?

To what degree are learners bringing practical benefits to the university or industry they work in or to the intellectual environment of the center itself?

• Knowledge/Technology Transfer

How is the program designed to make an impact, and who is the customer?

What is industry getting from the centers that it could not get from individual investigators?

What is the evidence that the centralized, multidisciplinary structure of centers makes university/industry collaboration more efficient?

• Institutional Impact

What organizational or policy changes occurred in the parent institutions as a result of creating centers?

What broader changes (e.g., in the culture of research) can be attributed to a program of centers or to the funding of center programs generally?

A number of cautions were offered for would-be evaluators. These included:

As outcomes to be measured are made more and more specific, and hence more easily measured, they also become less generalizable to other centers and center programs. A common collection of outcome measures might be possible but the elements might have to be weighted differently depending on the program being evaluated.
Data collection is a sensitive issue, not only because of time and cost, but because in many instances, program directors may already be

Page 116 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

demanding considerable data from the centers. Every effort should be made to inventory existing data for suitability.

In measuring impacts (as opposed to outcomes), isolating the effects of the center programs will be quite difficult.
A thorough and systematic evaluation of center program outcomes will take time and money.

The STC program had grown to 25 university-based centers by 1995 and was in the eighth year of a planned 11-year lifetime when the Committee on Science, Engineering, and Public Policy (COSEPUP) of the National Academies was asked to help with the decision about whether this experimental program should be continued and, if so, in what form. Specifically, COSEPUP was to review and interpret a body of data previously gathered by an outside contractor, Abt Associates (Fitzsimmons et al., 1996), draw its own conclusions about the program’s progress toward its goals, and make recommendations for the future use of the STC mode of support. The study was not to critique individual centers, but to evaluate the program as a whole.

The COSEPUP panel concluded that NSF and the nation were receiving a good return on a relatively small investment. It found that “most STCs were producing high quality research that would not have been possible without a center structure, and were a model for the creative interaction of scientists, engineers, and students in various disciplines and across academic, industry, and other institutional boundaries” (National Academy of Sciences, National Academy of Engineering, Institute of Medicine, 1996). The COSEPUP panel’s conclusion was not based on empirical data collected by Abt Associates, which were based on potentially biased self-reports from individuals who were direct or indirect beneficiaries of a center. COSEPUP’s own conclusions about the STC program relied heavily on reports of site visits, which were conducted annually by committees of experts for the first three years of each center’s existence, at 18-month intervals thereafter, and in conjunction with three-year and six-year renewal competitions.

The ERC program has undergone several systematic reviews. ERCs annually collect and report data on several performance outputs, such as number of publications, student enrollments, patents, and interactions with industrial partners. These data, along with site visits by external reviewers, are used by NSF in periodic reviews used to determine continued funding and midcourse changes, as needed, in research priorities and administrative arrangements. Program reviews of selected aspects of the ERC program have also been conducted by external consultants. Among these reviews have been survey-based assessments of the impacts of ERCs on the performance of ERC-based graduate students in their initial post-ERC jobs (Abt

Page 117 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

Associates, 1996) and the characteristics of ERC interactions with their industrial partners (Ailes et al., 1997).³ The plans of ERCs to maintain their activities following the expiration of NSF support were also the subject of an external review (Ailes et al., 2000).

POSSIBLE METHODOLOGIES

One attractive approach to evaluation is systematic comparison of center awards with other types research support in order to determine the “value added” by use of the center mechanism. Staff located a 1989 study that used such a strategy to look at the funding mechanisms supporting 13 major advances in cancer research (Narin, 1989). Narin used citation analysis to identify the key research papers in 13 major advances in cancer research and used the acknowledgment of support in each paper to link it to National Cancer Institute (NCI) support mechanisms, namely, R01, P01, R10 (now U10), P30, contract, and intramural. An obvious flaw in this approach is one noted above—the cancer centers supported by P30 core grants do not fund research directly, but rather support investigators with R01 grants and other sources of support. Therefore publications of scientists working in NIH-supported cancer centers might acknowledge the R01 grant that funded the research, but not P30 facilities and services that also contributed. Nevertheless, although Narin’s analysis showed that R01 and P01 grants and the NCI intramural program were acknowledged most frequently as a source of support by the key papers that were among the highly cited in their field (top 10 percent), P30 support was acknowledged by a majority of the highly cited papers.

Committee member Myron Weisfeldt recounted an unpublished comparison of center grants and multiproject P01 grants in which he participated as a member of the NHLBI’s Cardiology Advisory Committee in the late 1980s. The study compared NHLBI’s Specialized Centers of Research Excellence in Acute Myocardial Infarction to P01-supported research on the same topic. Evidence for impacts was sought in three realms: scientific, investigator development, and human health. In the first of these, evidence was bibliographic; investigator development was assessed in terms of electees to the American College of University Cardiologists, and human health impact was reduction of the one-year mortality from acute myocardial infarction. The study concluded that center grants and multiproject P01 grants produced roughly similar publication records, but that centers had far greater impacts on training future academic cardiologists and in reducing the mortality rate. It should be noted, however, that P01 grants

³	These two studies have been synthesized by Parker (1997).

Page 118 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

rarely have a training component although graduate students and postdoctoral fellows receive valuable research experience. It is also unlikely that many of the P01 grants had an explicitly clinical focus.

Other examples might be adduced to recommend comparison studies as the evaluation method of choice, but the committee believes that these two illustrate pitfalls that lie in that path. Instead, it recommends that evaluation take the form of a comparison of results achieved versus the expressed goals of the program. These goals will vary widely, corresponding to the wide variety of center programs that exist at NIH. A simple one-size-fits-all evaluation template will therefore not be feasible, but the remainder of this section contains some possible measures and methods that could be incorporated into a program-specific evaluation plan.

Indicators

Under this heading the committee includes numerous, often quantitative, measures of center program activities. Sometimes, but not always, they are generated in the course of program or center operations and therefore do not impose a major additional burden on either center or program staff. These indicators may in some instances be products (outputs) of those activities; in other cases they may be only descriptions, listings, or counts of activities taking place at the centers or taking place elsewhere in response to center activities (processes); in still other cases they may merely be descriptions of resources provided to or obtained by the centers (inputs) to make those activities possible.

Impacts on health are the most desirable indicators, but as noted above, some of the most important indicators of the impact of successful medical research, namely reductions in mortality and morbidity, may not become apparent for many years, and separating out the impact of specific mechanisms is extremely difficult. In the interim, intermediate outputs can be identified, for example, number of publications in the scientific literature and citation rates. In addition to outputs, program evaluation frequently relies on inputs and program activities as surrogate measures of long-delayed outcomes. At the center level these measures might include the number of new grants received or increases in the level of university support, the number of new scientists recruited and students trained, the number of interdisciplinary conferences held, and the number of projects under way. At the program level, one might point to changes in medical school curricula, changes in national policies and treatment guidelines, and utilization rates for new or altered interventions. Inputs, for example, dollars spent or number of centers established, are the indicators of last resort, or they are used only in evaluations taking place very early in the life of a program.

Box 5-2 contains a list of the types of indicators that institutes might

Page 119 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

BOX 5-2
Potential Indicators for Evaluating NIH Center Programs

Goal:	Increased basic and clinical research in program’s area of focus.
Indicators:	Increased number of studies in each category being funded, especially new studies; increased number and impact of publications and presentations of center research.
Goal:	More multidisciplinary research.
Indicators:	Increased number of collaborations established; increased number and percent of center studies, especially center scientific publications authored by teams of scientists from two or more university departments; greater number of disciplines represented among center-affiliated staff.
Goal:	More translational research.
Indicators:	Increased number of publications in clinically oriented journals; patent applications; licenses issued; and clinical trials under way or completed.
Goal:	Increased or more effective support, or both, for independently funded investigators.
Indicators:	Larger number of studies supported, especially new studies; more types and amounts of support supplied; characteristics of core facilities, materials and services available; increased number of publications of center-affiliated investigators.
Goal:	Increased attention to program’s area of focus by centers’ home institutions, scientific community, and general public.
Indicators:	Increased institutional support for center operations (space, faculty and staff, recognition on institutional organizational charts and publications); additional research funding from NIH and other public agencies, nonprofit organizations, and commercial industry.
Goal:	Successful recruitment of established researchers to the program’s area of interest.
Indicators:	More scientists with previous publications in the area joining the center; increased number of new grants or other funding obtained by these new investigators; number of publications, patents, or other products of work at the centers.
Goal:	Development of new investigators.
Indicators:	More trainees associated with the programs’ centers; current positions of former trainees; research grants subsequently won by these trainees at program centers or elsewhere; larger number of trainees who are elected members or fellows of professional societies.

Page 120 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

Goal:	Expanded education of health professionals.
Indicators:	Increased number of courses, seminars, and workshops offered by program centers; larger number of health professionals attending.
Goal:	Expanded education of the general public.
Indicators:	Increased number of publications in the popular press, radio or television appearances by center staffs, increases in patient load for relevant health problems; increased percentage of patients agreeing to participate in clinical research.
Goal:	Demonstration of state-of-the-art prevention, diagnosis, and treatment techniques.
Indicators:	Increased number of seminars, grand rounds, workshops, and other educational programs conducted; larger number of local and regional practitioners participating in such programs.

consider in evaluating their programs of center awards. The list is intended to be suggestive rather than all-inclusive, and in recognition of the varying goals of center programs, it is organized by some of the most commonly expressed goals of current center programs. Collection and analysis of data on the indicators should be included in the program design, and the awards should require centers to provide specific data for program evaluation as well as monitoring purposes.

Site Visits, Interviews, and User Surveys

Statistical indicators, whether collected en passant or specifically for purposes of program evaluation, are by their nature limited to quantifiable goals. Nearly every program evaluation combines these indicators with first-hand observations or other site-specific efforts to gather relevant information. Given the prominence in descriptions of centers of such intangibles as synergy and facilitation, any assessment of a center program should strongly consider inclusion of site visits to centers; interviews with center staff and other members of the institutions in which the centers are embedded; and systematic mail or phone surveys of program and center staff and, especially in the case of center infrastructure or core grants, systematic mail or phone surveys of the independent investigators whom the centers were designed to support.

Page 121 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

Designing Evaluation into Center Programs

NIH has established an extensive set of procedures for evaluating center applications for initial and renewal funding (although the use of site visits has been declining for budgetary reasons), and program officers review annual progress reports from centers. Some institutes have adopted sunset provisions for individual centers (i.e., no more than two five-year grants) or, in one case, conduct site visits in the third year of a center’s five-year award to assess whether to encourage a renewal proposal for a second award. Also, both initial and renewal applications are competitive, and poorly performing centers can be, and are, replaced individually.

Most center programs are not subject to the same level of periodic scrutiny as individual centers are when they apply for renewal (competitive continuations). Where there are regular program reviews by an institute’s national advisory council, they generally encompass a major program unit, e.g., the three program divisions of the National Institute of Allergy and Infectious Diseases and the six program divisions of the National Institute on Aging, in which a center program is just one of many activities. Renewal RFAs and PAs are not always reviewed as intensively or at the same level of review (i.e., by a chartered external advisory committee) as they were when they were new program initiatives. Formal evaluation plans are not usually developed at the beginning of a center program, so the data that will be needed in five or more years will be identified and collected from the start of the program.

Finding. NIH does not have formal regular procedures or criteria for evaluating center programs. From time to time, institutes conduct internal program reviews or appoint external review panels, but these ad hoc assessments are usually done in response to a perception that the program is no longer effective or appropriate rather than part of a regular evaluation process. Most of these reviews rely on the judgment of experts rather than systematically collected objective data, although some formal program evaluations have been performed by outside firms using such data.

Recommendation 5. Every center program should be given a formal external retrospective review for its continued effectiveness on a regular basis (at least every five to seven years). The review should be coordinated at an organizational level above the centers program itself.

The review should be performed by people at arms-length distance from the program and with the appropriate expertise to judge the varied activities of the centers. The views of interested publics, including the scientific and advocacy communities, as well as NIH officials and grantees, should be solicited as a matter of course.

Page 122 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

The program should be evaluated against its original objectives and with regard to contemporary challenges in its field. The review should include consideration of the question, “Are centers still the most appropriate means of making progress in this field?” and the criteria should be consistent with those adopted or developed in response to Recommendation 3 for establishing the center program in the first place.
The review should use multiple sources of evidence to evaluate the effectiveness of the program, and its conclusions should be evidence-based. The review might consider, for example, the scientific impact (e.g., publication counts and impacts, important discoveries, development and sharing of research tools); impact on human health (e.g., changes in health status); and impact on human resources (e.g., career paths of pre- and postdoctoral students and investigators).
A program evaluation plan should be developed as part of the design and implementation of new center programs, and data on indicators used in the evaluation plan should be collected regularly and systematically. Data should be collected from the centers according to a common format. Many of the indicators should also be useful for program monitoring and progress reporting. One set of potential indicators is provided in Box 5-2.
Each institute’s plan for evaluating center programs should be linked to its strategic planning process.

REFERENCES

Abt Associates. 1996. Job Performance of Graduate Engineers Who Participated in the NSF Engineering Research Centers Program. Report to the National Science Foundation, NSF Contract END 94-13151. Bethesda, MD: Abt Associates.

Ailes C, Roessner D, Feller I. 1997. The Impact on Industry of Interaction with Engineering Research Centers. Final report prepared for the National Science Foundation. Arlington, VA: SRI International.

Ailes C, Roessner D, Coward J. 2000. Documenting Graduation Paths: 2nd Year Report to the National Science Foundation. Arlington, VA: SRI International.

Balas EA, Boren SA. 2000. Managing clinical knowledge for health care improvement. In: Bemmel J, McCray AT, eds. Yearbook of Medical Informatics 2000: Patient-Centered Systems. Stuttgart, Germany: Schattauer Verlagsgesellschaft. Pp. 65-70.

Fitzsimmons SJ, Grad O, Lal B. 1996. An Evaluation of the NSF (National Science Foundation) Science and Technology Centers (STC) Program. Cambridge, MA: Abt Associates. Vol. I: Summary.

Lenfant C. 2002. Strengthening commitment to clinical research: The National Heart, Lung, and Blood Institute’s Specialized Centers of Research program. Circulation 105(4):400-401. [Online]. Available: http://www.nhlbi..nih.gov/funding/fromdir/circ-1-02.htm [accessed December 15, 2003].

Page 123 Cite

Suggested Citation:"5 Evaluation of Center Programs." Institute of Medicine. 2004. NIH Extramural Center Programs: Criteria for Initiation and Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/10919.

×

McCullough J. 1992. Draft Report of the NSF/Program Evaluation Staff Workshop on Methods for Evaluating Programs of Research Centers, January 1992. Washington, DC: National Science Foundation.

Narin F. 1989. The impact of different modes of research funding. In: Ciba Foundation Conference. The Evaluation of Scientific Research. Chichester, England: John Wiley & Sons. Pp. 120-140.

National Academy of Sciences, National Academy of Engineering, Institute of Medicine. 1996. An Assessment of the National Science Foundation’s Science and Technology Centers Program. Washington, DC: National Academy Press.

National Academy of Sciences, National Academy of Engineering, Institute of Medicine. 2001. Implementing the Government Performance and Results Act for Research, A Status Report. Washington, DC: National Academy Press.

NCRR (National Center for Research Resources). 2000. Evaluation of the Research Centers in Minority Institutions Program: Final Report 2000. Bethesda, MD: National Center for Research Resources.

Parker L. 1997. The Engineering Research Centers (ERC) Program: An Assessment of Benefits and Outcomes. Arlington, VA: National Science Foundation. [Online]. Available: http://www.nsf.gov/pubs/1998/nsf9840/nsf9840.htm [accessed December 15, 2003].