Appendix C
Performance Measures Used by Other Agencies and Organizations
Many agencies and organizations involved in managing research programs are attempting to develop performance measures to evaluate the quality of their programs. In an effort to understand how metrics can be used to evaluate programs effectively, the committee reviewed evaluation tools used by the National Institute for Standards and Technology (NIST), the U.S. Air Force, Texas, Maine, Kansas, and two academic programs: The National Science Foundation (NSF) Experimental Program to Stimulate Competitive Research (EPSCOR) and the National Sea Grant Program.
Typically, two primary mechanisms drive the use of performance measures: specific legislative requirements and the desire to benchmark outcomes and impacts of a program (J. Melkers, University of Illinois, Chicago, personal commun., June 8, 2002).
PERFORMANCE MEASURES USED TO EVALUATE FEDERAL AGENCIES
The NIST Advanced Technology Program (ATP) and the U.S. Air Force Scientific Advisory Board (SAB) are two examples of federal-agency research programs that have developed specific metrics or evaluation criteria to gauge the performance of their research programs.
National Institute for Standards and Technology Advanced Technology Program
The NIST ATP has developed a complex program-evaluation tool, the business reporting system (BRS). The BRS, which was implemented in 1994, is used to track companies that have received funding through the ATP. It is an impressive evaluation tool that comprehensively evaluates the business and economic impacts of each research project from start to finish (see Box C-1). Companies are asked to respond to a number of detailed surveys before, during, and after their projects are completed. The surveys include questions regarding the commercial application of proposals, business goals, strategies for commercialization and for protecting intellectual property, dissemination efforts, publication in professional journals and presentations at conferences, participation in user associations, public-relations efforts, R&D status, collaborative efforts, impacts on employment, and attraction of new funding.
U.S. Air Force Research Program
Evaluations of the U.S. Air Force research program are conducted by the U.S. Air Force SAB. The first SAB evaluation was conducted in 1991 (R. Selden, U.S. Air Force SAB, personal commun., Jan. 9, 2003). Programs are evaluated for quality and relevance of research, and each directorate is evaluated every 2 years. Typical metrics used to evaluate research programs are university metrics (publications, patents, and peer review) and a grading system that is used to evaluate the various components of the research programs in each directorate on the basis of 10 criteria (see Box C-2). Scores are normalized across the different directorates (Selden 1998).
PERFORMANCE MEASURES USED TO EVALUATE STATE-LEVEL PROGRAMS
Many states have developed science and technology performance metrics to evaluate whether and how research programs are encouraging economic development. State governments tend to be more interested in whether research programs are encouraging economic development than in the quality or value of the research itself. Texas, Kansas, and Maine have developed a process of using performance measures to evaluate their re
BOX C-1 NIST ATP Performance Metrics Companies are asked to submit information regarding their progress and economic contributions in the form of four electronically submitted reports: Baseline report. Companies submit information regarding commercial application of proposals, business goals, strategies for commercialization and for protecting intellectual property, and dissemination efforts. Companies are asked to rank the importance of publishing in professional journals, presenting papers at conferences, participation in user associations, and public-relation efforts (NIST 2002a). Anniversary report. Companies list major commercial applications identified in the proposal, business goals, progress toward commercialization, R&D status, collaborative efforts, employment impacts, attraction of new funding, strategies for protecting intellectual property, and dissemination efforts. This section asks companies to evaluate the status of their R&D cycle as a result of ATP funding (NIST 2002b). These reports are completed annually. Close-out report. Companies discuss commercial applications, business goals, early impacts on revenue and cost, future projections on revenue and costs, R&D status, collaboration impacts, impacts on employment, attraction of new funding, strategies for protecting intellectual property, and dissemination plans (NIST 2002c). Reports are completed after the conclusion of projects. Post-project summary. Companies provide information on postproject affiliation, funding sources, the impact of ATP funding on product development and process capability, anticipated future market activity, and R&D partnering with other organizations (NIST 2002d). Reports are completed 2, 4, and 6 years after projects are completed. |
search and science and technology programs. The performance measures are representative of similar evaluations being conducted by other states.
Texas
The primary research effort in Texas is known as the Advanced Reevaluation efforts for ARP/ATP are coordinated by the Texas Higher Edu
BOX C-2 U.S. Air Force Scientific Advisory Board Science and Technology Review Evaluation Criteria
|
Source: USAFSAB/USAFRL 1998. |
cation Coordinating Board. Progress reports and final reports are used to track the impact of research projects. Surveys are used to gather information about the progress of research. The survey metrics used to evaluate
research projects include number of publications and performance of graduate students (see Box C-3) (ARP/ATP 2002; J. Melkers, University of Illinois, Chicago, IL, personal commun., July 8, 2002).
Kansas
The Kansas Technology Enterprise Corporation is responsible for managing grant programs for applied research and for equipment used in science and technology skill training. Examples of metrics used to evaluate programs are ranking of importance of commonly accepted economic development goals (such as, job creation, encouraging technologic innovation and entrepreneurial spirit, and literature review), and literature reviews (Burress et al. 1992).
Maine
Maine’s Research and Development Evaluation Project (headed by the Maine Science and Technology Foundation) was asked by the state legislature to undertake a comprehensive 5-year evaluation that focuses on how the state R&D program has evolved and affected R&D industry and the level of innovation-based economic development in the state. Evaluation of Maine’s Public Investment in Research and Development, a report produced by the Maine Science and Technology Foundation (MSTF 2001), documents each research program that has been evaluated and the processes and methods used to evaluate each program. Performance measures are varied. For instance, the evaluation of the Maine Biomedical Research Program focused on output and outcome measures. Output measures include a plan showing how the funds would be used and the resulting research and economic benefits, peer-review journal articles demonstrating competitiveness of the institution’s research, and the amount of funding from outside sources and its use. Outcome measures include an evaluation of the direct and indirect economic impact of the funded research and an assessment of the contribution of the funded research to scientific advancement and the institution’s competitive position (MSTF 2001). The foundation has prepared a survey for research institutions to assist them in collecting data for program evaluation (see Box C-4).
BOX C-3 Texas Higher Education Coordinating Board Research-Project Performance Metrics The following are the questions being asked in 2002 of all persons who have received ARP/ATP funds. Responses are provided electronically.
Subsequent questions are related to information about interaction with actual or potential collaborators, commercialization, knowledge utilization, and possible licensing opportunities. Source: ARP/ATP 2002. |
PERFORMANCE EVALUATION MEASURES IN ACADEME
Program evaluation is also used widely in academe, particularly in programs that involve improving educational opportunities and academic
BOX C-4 Draft Survey for Maine Research Institutions (Revised February 1, 2002)
|
Source: MSTF 2001 |
competitiveness. Two such programs that are federally funded but administered by universities are NSF’s EPSCOR and the National Sea Grant Program.
The Experimental Program to Stimulate Competitive Research
EPSCOR is designed to improve the R&D competitiveness of states that have traditionally received smaller amounts of federal research and development funding—based on a per capita comparison. The program requires a commitment on the part of the states to improve the quality of science and engineering research and training at colleges and universities. Three key groups of metrics describe a state’s science and technology environment: NSF support, total federal academic R&D contribution, and high technology activity (NSF 2002). Each group contains a number of metrics that can be compared across states and over time. Most of the metrics involve assessments in terms of people, programs, and dollars. The following are examples:
-
Total number of NSF research-support awards per year.
-
Academic R&D obligations by all federal agencies per year.
-
Total number of graduate students in science and engineering.
Additional measures of the effectiveness of the programs include number of grant proposals submitted, number of grant proposals funded, quality of peer-reviewed research, professional contributions of students, publica
tion and patent productivity, return on investment, and contribution to the state (for example, an improved environmental program) (NSF 2002).
National Sea Grant Program
The National Sea Grant Program, created in 1966, established a partnership between the National Oceanic and Atmospheric Administration and universities to encourage the development of sea-grant institutions for the purpose of engaging in research, education, outreach, and technology transfer in an effort to encourage stewardship of the nation’s marine resources.
Performance benchmarks for evaluation have been developed to determine whether the goals and strategic plans of each sea-grant institution are being met. Programs are evaluated according to the following weighted criteria (NOAA, unpublished material, 1998):
-
Effective and aggressive long-range planning (relative weight, 10%).
-
Organizing and managing for success (relative weight, 20%).
-
Connecting sea grant with users (relative weight, 20%).
-
Producing significant results (relative weight, 50%).
Each sea-grant institution is given sets of recommended questions or established expected-performance benchmarks designed to gauge how well the program has met the goals established during strategic planning. Benchmarks typically include questions about the quality of the peer-review process, detailed information about the strategic planning process, measures to determine the quality of program management, ability of the program to develop private-sector matching funds, the number of published peer-reviewed papers in relation to the size of the research program, and questions to gauge the social, economic, and scientific contributions of program research (NOAA, unpublished material, 1998).
CONCLUSIONS
Federal research programs tend to focus more on the collection of product metrics than process metrics. Among federal research programs, there tends to be a presumption that peer review is the key process necessary to
ensure a successful program. However, there tends to be relatively little discussion of who is responsible for conducting the peer-review evaluations. The committee considers that peer review is a necessary but not sufficient condition to ensure a successful program.
Evaluations at the state level are driven principally by economic considerations. There tends to be little targeting of specific research topics except in broad terms, such as nanotechnology. Many of the evaluations are based on surveys of participating institutions and data routinely collected at the state level, such as number of students enrolled in institutions of higher learning.
NSF’s EPSCOR produces a level of standardization that allows comparison of R&D across states and across time. The standardization across time and place provides consistency, an important attribute of metrics.
REFERENCES
ARP/ATP (Advanced Research Program/Advanced Technology Program). 2002. Research Projects Performance Metrics. Texas Higher Education Coordinating Board, Austin, TX. [Online]. Available: http://www.arpatp.com/online/ [accessed June 12, 2002].
Burress, D., M. El-Hodiri, and V.K. Narayanan. 1992. An Evaluation Model to Determine the Return on Public Investment (ROPI) for the Kansas Technology Enterprise Corporation. Report No. 211. Institute for Public Policy and Busi-ness Research, The University of Kansas. November [Online]. Available: http://www.ukans.edu/cwis/units/ippbr/resrep/pdf/ M211.pdf [accessed Jan. 22, 2003].
MSTF (Maine Science and Technology Foundation). 2001. Evaluation of Maine's Public Investment in Research and Development. [Online]. Available: http: //www.mstf.org/evaluation/pdfjump.html. [accessed Jan. 29, 2003].
NIST (National Institute of Standards and Technology). 2002a. ATP Baseline Business Report. Optional Worksheet for Organizing Data. Advanced Tech-nology Program, National Institute of Standards and Technology, Technology Administration, U.S. Department of Commerce.
NIST (National Institute of Standards and Technology). 2002b. ATP Anniversary Business Report. Optional Worksheet for Organizing Data. Advanced Technol-ogy Program, National Institute of Standards and Technology, Technology Administration, U.S. Department of Commerce.
NIST (National Institute of Standards and Technology). 2002c. ATP Close-Out Business Report. Optional Worksheet for Organizing Data. Advanced Technol-ogy Program, National Institute of Standards and Technology, Technology Administration, U.S. Department of Commerce.
NIST (National Institute of Standards and Technology). 2002d. ATP Post-Project Summary Business Report . Optional Worksheet for Organizing Data. Ad-vanced Technology Program, National Institute of Standards and Technology, Technology Administration, U.S. Department of Commerce.
NSF (National Science Foundation). 2002. Experimental Program to Stimulate Competitive Research (EPSCoR). National Science Foundation. [Online]. Available: http://www.ehr.nsf.gov/epscor/statistics/ start.cfm/ [accessed Dec. 19, 2002].
Selden, R.W. 1998. Air Force Science and Technology Quality Review. Review Overview Document. Scientific Advisory Board Science and Technology. June.
USAFSAB/USAFRL (U.S. Air Force Scientific Advisory Board and U.S. Air Force Research Laboratory). 1998. Memorandum of Understanding for the Air Force Science and Technology Quality Review between U.S. Air Force Scien-tific Advisory Board, and U.S. Air Force Research Laboratory, Appendix III. Evaluation Criteria. August 1998.