Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
2 1 The Program Evaluation Context P rogram evaluation has been defined as âsystematic inquiry that describes and explains the policiesâ and programâs operations, effects, justifications, and so- cial implicationsâ (Mark et al., 2000, p. 3) or â. . . the systematic collection of information about the activities, characteristics, and outcomes of programs to make judgments about the program, improve program effectiveness, and/or inform deci- sions about future programmingâ (Patton, 1997, p. 23). The evaluations of National Institute for Occupational Safety and Health (NIOSH) programs carried out under the guidance of the framework committee represent just one way of evaluating re- search programs. This chapter places the National Academiesâ evaluations of NIOSH programs in context by providing a brief overview of the general principles involved in program evaluation and by describing where the process fits in the spectrum of current practices in evaluating research programs. At the conclusion of some of the overview sections throughout the chapter, the committeeâs findings specific to the evaluation process used by the framework and evaluation committees are included in bold and italicized text. PROGRAM EVALUATION Although formal program evaluations, especially of educational programs, preceded World War I, the profession as currently practiced in the United States 1This chapter draws on background papers commissioned by the committee from Sonia Gatchair, Georgia Institute of Technology, and Monica Gaughan, University of Georgia. 25
26 E v a l u a t i n g O c c u p a t i o n a l H e a l t h a n d S a f e t y Research Programs has increased in prominence within the past 50 years. A major impetus to this growth was the need to assess the social programs instituted through the War on Poverty and Great Society policies of the 1960s (Shadish et al., 1991). Legislative requirements for the evaluation of many programs represented a turning point in the growth in the number of evaluations. Evaluation is now an established profes- sional practice, reflected through organizations such as the American Evaluation Association and the European Evaluation Society (AEA, 2009; EES, 2009). Program evaluation is one element of results-oriented management, the approach to public management embodied in the past decade in the Government Performance and Results Act (OMB, 2009a) and the Office of Management and Budgetâs (OMBâs) Program Assessment Rating Tool (OMB, 2009b). Current efforts in program evaluation follow several schools of thought that differ in the evaluation processes used but are all focused on achieving a valid evalu- ation. The essence of evaluation is determining what is of value in a program. The work revolves around understanding program goals (if available), setting criteria for success, and gathering information to determine whether the criteria are being met as a result of program activities. Program evaluations focus on examining the characteristics of a portfolio of projects rather than assessing one project at a time and often use retrospective information about program outputs and outcomes. Program evaluation differs from a research project in being more tightly connected to practice; it is commissioned by a specific user or organization and designed to inform decision making. It also differs from performance measurement, which is an ongoing process that gathers indicators of what the program is accomplishing but may not assess why the indicators are changing. Program evaluations can serve several functions. When the program is initially in development or is undergoing changes and is being evaluated with the goal of program improvement, the evaluation is termed a formative evaluation (Scriven, 1991). These evaluations are often initiated and used in-house. When the objec- tive of the evaluation is to assess the programâs outcomes in order to determine whether the program is succeeding or has accomplished its goals, the evaluation is termed a summative evaluation (Scriven, 1967; Gredler, 1996). Users of summative evaluations are often decision makers outside of the program. Program evalua- tion often also helps communicate the programâs goals and accomplishments to external audiences. Evaluations provide information that contributes to decisions that shape program goals, strategic plans, and actions. In these cases, they serve instrumental functions. Often they also serve enlightenment functions, such as increasing general understanding of program operations, underlying assumptions, or social context (Weiss, 1977). The practice of evaluating research programs has historically been somewhat separate from that of social program evaluation. Qualitative assessments of research programs in the United States date back to the 1950s (NAS, 1959). The evaluation
T h e P r o g r a m E v a l u a t i o n C o n t e x t 27 of research programs took a more quantitative turn in the 1970s as evaluations started to draw on the new availability of large-scale databases to describe scientific activity. Research program evaluation is distinguished from social program evalu- ation in a number of ways, including the dominant use of peer-review panels and the use of specialized data, including publication and patent-based measures (see discussion later in this chapter). The evaluations of NIOSH programs discussed in this report were un- dertaken in the context of the externally mandated Program Assessment Rating Tool process, a summative evaluation process developed by OMB. However, NIOSH leadership established their primary goal as program improvement, making the evaluations primarily formative. LOGIC MODELS The evaluations of NIOSH programs used logic modelsâboth a general logic model for NIOSH research and specific logic models for each program evaluated. Prior to the work of the evaluation committees, NIOSH contracted with RAND Corporation to provide operational and analytical assistance with compiling the evidence packages for the reviews and developing the logic models; a detailed description of that effort can be found in a recent RAND report (Williams et al., 2009). Logic models are widely used in program evaluation (W. K. Kellogg Founda- tion, 2000; World Bank, 2000) to represent visually what evaluators call âprogram theory.â This phrase refers to the understanding of how the program is supposed to work. How do the program resources become results, and through what channels do those results have their expected impacts? The logic model may be represented as a set of boxes and arrows or as a hierarchy of goals, intermediate outcomes, and final outcomes. The representation provides guidance for the evaluation by point- ing to relevant kinds of information to be considered in the assessment and often to indicators in the various areas of the model. McLaughlin and Jordan (1999) refer to logic models as a way of âtelling your programâs performance story.â The common elements of logic models are inputs, activities, outputs, customers, and outcomes (short, medium, and long term), plus external influences (Wholey, 1983; Figure 2-1). Building a logic model is a process that should involve a team of people with different roles in the program who interact with external stakeholders at many points. After collecting relevant information and clearly identifying the problem the program addresses, the team organizes its information into various elements and composes a diagram that âcaptures the logical flow and linkages that exist in any performance storyâ (McLaughlin and Jordan, 1999, p. 68).
28 E v a l u a t i n g O c c u p a t i o n a l H e a l t h a n d S a f e t y Research Programs Activities Intermediate Long-term Resources Outputs for Short-term Outcomes Outcomes Activities Outputs Customers (inputs) Outcomes (through and Problem Reached customers) Solution External Influences and Related Programs FIGURE 2-1â Elements of the logic model. Reprinted from McLaughlin and Jordan, 1999, with permission from Elsevier. Logic models are nearly always predominantly linear and causal because agen- Figure 1. Elements of the Logic Model cies use them to think through how programs will achieve their public goals. In research planning and evaluation, this linearity is ironic. The widespread consensus is that research does not create its effects in a linear fashion. Rather, it is embedded in a complex ecology of relationships that shape and spread knowledge through a Figure 2-1 variety of channels, not just research knowledge. R01502 Additionally, it is challenging for logic models to capture some outputs such editable as the development of human capital. Over time, a program may have a significant impact on a field by helping to build a community of practitioners and researchers. sized for portrait For example, NIOSHâs impact on the existence and growth of the occupational safety and health research community is hard to capture in a logic model. In ad- dition, ongoing dialogues with external stakeholders shape research activities and spread research knowledge in ways that are hard to track. Program evaluations that solely rely on the logic model almost inevitably miss information on some of the nonlinear effects of program activities. The logic models used in the evaluation of NIOSH programs helped pro- gram staff and evaluators organize information into steps in the flow of program logic. However, because some of the NIOSH programs spanned several NIOSH divisions and laboratories, the logic model sometimes made it hard for the evaluation committee to grasp the full picture of the research program. Furthermore, the logic models focused a great deal of attention on the most readily observable short- and medium-term outcomes, perhaps missing information on nonlinear and more diffuse contributions of the programs to the development of knowledge and hu- man capital in occupational safety and health. ROLE OF STAKEHOLDERS The practice of program evaluation has paid special attention to external stake- holders and the role they play in the evaluation process. Sometimes stakeholders
T h e P r o g r a m E v a l u a t i o n C o n t e x t 29 are direct beneficiaries of the program; for example, for a day-care center program, the major stakeholders are the families whose children receive care. Sometimes the stakeholders are organizations with whom the program must work to achieve its goals. In the case of research on occupational safety and health, key stakeholders include workers, employers, and regulatory agencies. Stakeholder participation in evaluating research programs has come more slowly than in social program evaluation. Early evaluation panels tended to consist entirely of scientists and engineers. But as research policy became more focused on making research relevant to the private sector, evaluation panels also began to include industry and labor representation, often scientists and engineers working in industry and labor organizations. Individuals and families exposed to environmental hazards often organize to increase research and reme- diation efforts, and stakeholders from these groups also participate in evaluation processes. Just as social program evaluation pays particular attention to differences in knowledge and expertise between evaluators and stakeholders, in the evaluation of research programs the different contributions of scientific experts and external stakeholders both need to be respected. When the research being evaluated is intended to serve vulnerable populations, current standard practice in the evalu- ation of research programs, as described in the last paragraph, is not sufficient to give voice to these groups and additional attention needs to be paid to obtaining adequate input. The National Academies evaluation committees included a variety of members with strong connections to NIOSHâs external stakeholder groups, such as manufacturers of safety equipment, labor organiza- tions, and employers. The committees also reached out to a wide range of external stakeholder groups for input, including vulnerable worker populations. METHODS OF EVALUATION Evaluations of research programs necessarily use a variety of approaches. Ex- pert panel review is the âbread-and-butterâ approach worldwide, but there is also a long track record of evaluation studies, in which external consultants gather and analyze primary data to inform the expert deliberations. Within the range of evaluation approaches for research programs, the National Academiesâ evaluations of NIOSH programs clearly fall among expert panel evaluations, rather than evaluation studies.
30 E v a l u a t i n g O c c u p a t i o n a l H e a l t h a n d S a f e t y Research Programs Expert Panel Review Merit review, peer review, and expert panels are used widely for both ex ante and ex post evaluations of the productivity, quality, and impact of funding organizations, research programs, and scientific activity. Benefits and limitations of this approach have been reviewed extensively (Bozeman, 1993; Guston, 2003; Hackett and Chubin, 2003). Expert panel review is the oldestâand still most c Â ommonâform of research and development evaluation. In fact, the expert panel is very much a historical development from the National Academies itself, which was established in the 19th century to provide scientific and technical policy advice to the federal government. The underlying evaluative theory of the expert panel is that scientists are uniquely able to evaluate the quality and importance of scientific research (Kostoff, 1997). The preeminence of scientists to evaluate the quality and importance of scientific research was further codified in the research agencies that developed under the philosophy of Vannevar Bush in the 1940s (Bush, 1945). Expert judgment is particularly capable of evaluating the quality of discrete scientific research activities and the relevance of such discrete activities to par- ticular bodies of knowledge. For example, toxicologists and biochemistsâthrough their scientific trainingâare uniquely capable of assessing the contributions of particular theories, research methodologies, and evidence to answer specific sci- entific questions and problems. The major limitation of expert panel review is that traditional training and experience in the natural and physical sciences do not prepare scientists to address questions related to the management, effective- ness, and impact of the types of broad research portfolios that federal agencies typically manage. Although expert panel reviews work to balance conflicting values, objectives, or viewpoints, they also may lead to tensions in the same areas they are expected to resolve. As noted above, the review process may be broadened to include other stakeholders beyond âexpertsâ or âpeers.â Expert panels usually operate with an evaluation protocol developed by an outside group, including evaluation proce- dures, questions to be answered, and evaluation criteria (e.g., the evaluation of the Sea Grant College Program, Box 2-1). The panels usually review a compilation of data on the program, including plans, input counts (budget, staffing), project descriptions, and lists of results. They then talk with individuals connected to the program, both inside and outside the given agency, and use their own experience and judgment in reaching conclusions. Closely tied to review processes is the assignment of various types of ratings. For example, the Research Assessment Exercise of the United Kingdom uses 15 panels and 67 subpanels following a common protocol to assess university research programs and assign scores by discipline area (RAE, 2009). Rating scales are be-
T h e P r o g r a m E v a l u a t i o n C o n t e x t 31 BOX 2-1 Evaluation of the National Sea Grant College Program The National Sea Grant College Program, funded by the National Oceanic and Atmo- spheric Administration, is a nationwide network of 30 university programs aimed at conduct- ing research, education, and training on coastal resources and marine policy. A 1994 National Academiesâ review of the program (NRC, 1994) recommended that individual program evaluations be conducted on a 4-year review cycle. From 1998 to 2006, two cycles of site visit evaluations were conducted using a uniform and detailed set of performance criteria and a standardized set of benchmarks and indicators developed by the external review panel charged with oversight (NRC, 2006). Programs were scored on criteria in the major areas of: â¢ Using effective and aggressive long-range planning; â¢ Organizing and managing for success; â¢ Connecting Sea Grant with users; and â¢ Producing significant results. At the end of the 4-year cycle, a final evaluation process provided a comparative as- sessment across the 30 university programs. The National Academies was asked to examine the National Sea Grant evaluation process. The resulting report included recommendations emphasizing the need for internal assessments to complement external evaluations, increased opportunities for interactions among the university programs, streamlined annual assessments, and improvements in strategic planning (NRC, 2006). ing used more frequently as evaluations have become more and more oriented to demonstrating performance to outside audiences or to allocating resources. Rating scales capture qualitative judgments on ordinal scales and allow for descriptions of performance at the various levels. Characteristics that are sought in expert panel reviews include a panel with a balanced set of expertise and credibility among various stakeholder groups and independence and avoidance of conflict of interest among panel members to the extent possible. Selection of panel members can involve trade-offs between recruit- ing independent reviewers or recruiting reviewers with knowledge and understand- ing of the program and its field of science. For this reason, expert review panels are seldom completely free of bias and may have conflicts of interest; the preferred practice, of course, is for conflicts to be considered and disclosed. Independence is also reinforced when the panel is commissioned by, and reports to, a user located at least one level above the program in the management hierarchy. The panel adds
32 E v a l u a t i n g O c c u p a t i o n a l H e a l t h a n d S a f e t y Research Programs value by including its perspectives and insights in its report. The panel makes the evidence base for its conclusions explicit in the report and usually makes a limited number of realistic recommendations, phrased broadly enough to allow manage- ment to adapt the recommendations to specific circumstances. The National Academies committees follow a thorough bias and conflict- of-interest process that includes completion of disclosure forms and the bias and conflict-of-interest discussion held at the first meeting. Other Methods of Evaluating Research Programs Other types of evaluations generally involve hiring consultants to provide analyses of specific outputs of the program. Because the goal of a research program is new knowledge, publications represent a concrete and observable manifestation of new knowledge and are frequently used as a convenient measure of research program outputs. Publications in peer-reviewed journals provide an indication of quality control, and citations to published articles are used to assess the scientific impact of the work. Patents provide a similar set of measures for technology de- velopment. Thus, evaluations of research programs have extensive relevant datasets on which to base their assessments. Statistical analyses of data on publications (e.g., books, journal articles, review articles, book chapters, notes, letters) range from fairly simple counts and com- parisons of publications to highly sophisticated factor analyses and correlations of many types of terms, such as keywords, institutions, and addresses, that lead to the development of networks or maps of the ways in which the research outputs are connected. These bibliometric methods are used extensively to evaluate research activities and compare research output across institutions, disciplines, fields, fund- ing programs, countries, and groups of researchers (Kostoff, 1995; Georghiou and Roessner, 2000; Hicks et al., 2004; Weingart, 2005). Bibliometric methods also can be used to assess the extent of collaboration. Visualization techniques now produce âmaps of scienceâ allowing organizations that support research to âseeâ where the work they have supported fits into research in a specific field or the extent to which it is being used in other research endeavors. An important strength of bibliometric analyses is that they are data-based analyses following a fixed set of rules or algo- rithms. The analyses are often used as a complement to peer-review techniques, surveys, or impact analyses of research activities. An important weakness, however, is that the measures are incomplete. They do not capture all the dimensions of performance or its context, factors that an evaluation usually needs to consider. In general, a composite set of measures is used to determine the effectiveness of the research activities, institutions, or national programs (Box 2-2).
T h e P r o g r a m E v a l u a t i o n C o n t e x t 33 BOX 2-2 Review of the National Science Foundationâs Science and Technology Center Programs Beginning in 1989, the National Science Foundation (NSF) established 25 Science and Technology Centers (STCs) across the United States. The goal was to promote cutting-edge fundamental research in all areas of science, improve the quality of science and math educa- tion, and enhance the transfer of knowledge among disciplines. The efforts of these center programs have been evaluated through several external assessments, including site-visit teams. A congressionally requested review of the management of the STC program was conducted by the National Academy of Public Administration (NAPA, 1995). The National Academies was asked to conduct an evaluation of the accomplishments of the STC Program as a whole, rather than individual center evaluations (NRC, 1996). Evalu- ation input included data from Abt Associates regarding their historical review; secondary data analysis on the characteristics and operations of the 25 centers; bibliometric and patent analyses; and surveys of principal investigators, industry/federal laboratory representatives, educational outreach collaborators, and other key stakeholders (Fitzsimmons et al., 1996). The National Research Council report recommendations included an increased emphasis on graduate and undergraduate education and coordination of the reviews of the program (NRC, 1996). Other methods used in evaluating research programs include methodologies drawn from the social sciences, including case studies, interviews, and surveys. One special application of case studies in the evaluation of a research program, for example, is the TRACES approach, named for an early study of Technology in Retrospect and Critical Events in Science (IIT, 1968). This approach starts from a recent accomplishment or success, then tracks the complex set of earlier research results and technologies that made it possible. Programs with economic goals have also used case studies to illustrate the return on investment in advanced technology projects (Ruegg, 2006). SUMMARY In summary, the evaluation of research programs is an established branch of program evaluation. The National Academiesâ evaluation of NIOSH research programs used one of the most common approaches: expert panel review. As is common in evaluations of applied research programs, this process involved stakeholders as members of the evaluation committees and also sought external stakeholder input. The evaluation framework described in Chapter 3 organizes data into a common evaluation tool based on a logic model approach and provides for
34 E v a l u a t i n g O c c u p a t i o n a l H e a l t h a n d S a f e t y Research Programs consideration of external factors. Similar to many research program evaluation ef- forts, the evaluation committees used this structured rating tool to provide some consistency in ratings across programs. The process did not, however, expand into an evaluation study by gathering new data or extensively analyzing external data sources. The evaluations of NIOSH programs fall well within the range of accept- able practice in evaluating research programs and are compiled in comprehensive reports that went through peer review under the National Academiesâ report review process. REFERENCES AEA (American Evaluation Association). 2009. American Evaluation Association. http://www.eval. org (accessed March 23, 2009). Bozeman, B. 1993. Peer review and evaluation of R&D impacts. In Evaluating R&D impacts: Methods and practice. Edited by B. Bozeman and J. Melkers. Boston, MA: Kluwer Academic. Pp. 79â98. Bush, V. 1945. Science: The endless frontier. Washington, DC: U.S. Government Printing Office. EES (European Evaluation Society). 2009. European Evaluation Society. http://www.europeane valuation.org/ (accessed March 23, 2009). Fitzsimmons, S. J., O. Grad, and B. Lal. 1996. An evaluation of the NSF Science Technology Centers Pro- gram. Vol. 1, Summary. Washington, DC: Abt Associates. http://www.nsf.gov/od/oia/programs/ stc/reports/abt.pdf (accessed March 23, 2009). Georghiou, L., and D. Roessner. 2000. Evaluating technology programs: Tools and methods. Research Policy 29(4â5):657â678. Gredler, M. E. 1996. Program evaluation. Englewood Cliffs, NJ: Merrill. Guston, D. 2003. The expanding role of peer review processes in the United States. In Learning from science and technology policy evaluation: Experiences from the United States and Europe. Edited by P. Shapira and S. Kuhlmann. Cheltenham, UK, and Northampton, MA: Edward Elgar. Pp. 81â97. Hackett, E., and D. Chubin. 2003. Peer review for the 21st century: Applications to education research. Paper presented at the National Research Council Workshop, Washington, DC, February 25, 2003. http://www7.nationalacademies.org/core/HacketChubin_peer_review_paper.pdf (ac- cessed November 16, 2008). Hicks, D., H. Tomizawa, Y. Saitoh, and S. Kobayashi. 2004. Bibliometric techniques in the evaluation of federally funded research in the United States. Research Evaluation 13(2):78â86. IIT (Illinois Institute of Technology). 1968. Technology in retrospect and critical events in science. Vol. 1. Chicago, IL: IIT Research Institute. Kostoff, R. N. 1995. Federal research impact assessmentâaxioms, approaches, applications. Scien- tometrics 34(2):163â206. Kostoff, R. N. 1997. Peer review: The appropriate GPRA metric for research. Science 277:651â652. Mark, M., G. Henry, and G. Julnes. 2000. Evaluation: An integrated framework for understanding, guiding, and improving policies and programs. San Francisco: Jossey-Bass. McLauglin, J. A., and G. B. Jordan. 1999. Logic models: A tool for telling your programâs performance story. Evaluation and Program Planning 22:65â72. NAPA (National Academy of Public Administration). 1995. National Science Foundationâs Science and Technology Centers: Building an interdisciplinary research program. Washington, DC: NAPA.
T h e P r o g r a m E v a l u a t i o n C o n t e x t 35 NAS (National Academy of Sciences). 1959. Panel reports of the NAS-NRC panels advisory to the National Bureau of Standards. Washington, DC: National Academy of Sciences. NRC (National Research Council). 1994. A review of the NOAA National Sea Grant College Program. Washington, DC: National Academy Press. NRC. 1996. An assessment of the National Science Foundationâs Science and Technology Centers Program. Washington, DC: National Academy Press. NRC. 2006. Evaluation of the Sea Grant Program review process. Washington, DC: The National Academies Press. OMB (Office of Management and Budget). 2009a. Government Performance and Results Act. http:// www.whitehouse.gov/omb/mgmt-gpra/gplaw2m.html (accessed March 20, 2009). OMB. 2009b. Assessing program performance. http://www.whitehouse. gov/omb/part/ (accessed March 20, 2009). Patton, M. Q. 1997. Utilization-focused evaluation: The new century text. Thousand Oaks, CA: Sage. RAE (Research Assessment Exercise). 2009. Research Assessment Exercise. http://www.rae.ac.uk/ (ac- cessed March 23, 2009). Ruegg, R. 2006. Bridging from project case study to portfolio analysis in a public R&D programâA framework for evaluation and introduction. NIST GCR 06-891. http://www.atp.nist.gov/eao/ gcr06-891/gcr06-891report.pdf (accessed March 23, 2009). Scriven, M. 1967. The methodology of evaluation. In Curriculum evaluation: AERA monograph series on evaluation. Edited by R. E. Stake. Chicago: Rand McNally. Pp. 39â85. Scriven, M. 1991. Beyond formative and summative evaluation. In Evaluation and education: At quarter century. Edited by M. W. McLaughlin and D. C. Phillips. Chicago: University of Chicago Press. Pp. 19â64. Shadish, W. R., T. D. Cook, and L. C. Leviton. 1991. Foundations of program evaluation: Theories of practice. Newbury Park, CA: Sage. Weingart, P. 2005. Impact of bibliometrics upon the science system: Inadvertent consequences? Scientometrics 62(1):117â131. Weiss, C. H. 1977. Research for policyâs sake: The enlightenment function of social research. Policy Analysis: 3:531â545. Wholey, J. S. 1983. Evaluation and effective public management. Boston: Little, Brown. Williams, V. L., E. Eiseman, E. Landree, and D. M. Adamson. 2009. Demonstrating and communicating research impact: Preparing NIOSH programs for external review. Santa Monica, CA: RAND. W. K. Kellogg Foundation. 2000. Logic model development guide. http://www.wkkf.org/Pubs/Tools/ Evaluation/Pub3669.pdf (accessed March 23, 2009). World Bank. 2000. Logframe handbook: A logical framework approach to project cycle management. http://www.wau.boku.ac.at/fileadmin/_/H81/H811/Skripten/811332/811332_G3_log-frame hand book.pdf (accessed March 23, 2009).