Judith S. Sunley
National Science Foundation
The passage of the Government Performance and Results Act (GPRA, or the Results Act) in 1993 and its imminent implementation with the development of the FY 1999 budget request has made all federal agencies more sensitive to the importance of assessing the results of their activities. This presentation reflects the wide-ranging thinking and discussion that have gone into developing the National Science Foundation's (NSF's) response to the Results Act and includes information taken from public elements of NSF's strategic and performance plans. Any opinions are those of the author, rather than official agency positions.
As scientists, we know that there are many different ways of "measuring" things, and, in fact, there are whole fields of science devoted to measurement and evaluation. Key elements in any assessment of research activities include who is doing the assessment and what their expectations are for program outcomes. We know that different constituencies may attach different values to the same characteristics and may have quite different ideas about which dimensions of an effort merit consideration during an assessment. Equally important is the level of aggregation at which the assessment is made. We evaluate the results of a specific research project quite differently from the results of a broad program of activity. Finally, the stage at which a set of research activities is assessed is important in determining reasonable expectations for the assessment.
The multidimensional character of the contributions of research means that absolute valuations are difficult, particularly given the precision to which the individual measurements can be made. Precision is particularly problematic with assessments of quality, which are essential for research. This introduces some fuzziness in assessing the value of research that makes many outside science and engineering uncomfortable. The lack of precision requires the use of expert judgment in making effective assessments.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 107
--> 11 Assessing the Value of Research at the National Science Foundation Judith S. Sunley National Science Foundation Introduction The passage of the Government Performance and Results Act (GPRA, or the Results Act) in 1993 and its imminent implementation with the development of the FY 1999 budget request has made all federal agencies more sensitive to the importance of assessing the results of their activities. This presentation reflects the wide-ranging thinking and discussion that have gone into developing the National Science Foundation's (NSF's) response to the Results Act and includes information taken from public elements of NSF's strategic and performance plans. Any opinions are those of the author, rather than official agency positions. As scientists, we know that there are many different ways of "measuring" things, and, in fact, there are whole fields of science devoted to measurement and evaluation. Key elements in any assessment of research activities include who is doing the assessment and what their expectations are for program outcomes. We know that different constituencies may attach different values to the same characteristics and may have quite different ideas about which dimensions of an effort merit consideration during an assessment. Equally important is the level of aggregation at which the assessment is made. We evaluate the results of a specific research project quite differently from the results of a broad program of activity. Finally, the stage at which a set of research activities is assessed is important in determining reasonable expectations for the assessment. The multidimensional character of the contributions of research means that absolute valuations are difficult, particularly given the precision to which the individual measurements can be made. Precision is particularly problematic with assessments of quality, which are essential for research. This introduces some fuzziness in assessing the value of research that makes many outside science and engineering uncomfortable. The lack of precision requires the use of expert judgment in making effective assessments.
OCR for page 107
--> The NSF Context for Assessing the Value of Research GPRA requires the development of a strategic plan that guides annual performance plans and reports. Key factors of the strategic plan are statements of mission and general goals. These provide the context for assessing agency performance. NSF's continuing mission is stated in the preamble to the National Science Foundation Act of 1950 (Public Law 810507): ''To promote the progress of science; to advance the national health, prosperity, and welfare; to secure the national defense; and for other purposes." GPRA authorizes and directs NSF to initiate and support the following: Basic scientific research and research fundamental to the engineering process, Programs to strengthen scientific and engineering research potential, Science and engineering education programs at all levels and in all the various fields of science and engineering, and An information base for science and engineering appropriate for development of national and international policy. NSF works toward its mission through the support of research, infrastructure development, and education and training, largely at academic institutions. When we assess our programs, we are thus assessing the results and outcomes of the investments we make. We examine the outcomes of aggregate collections of awards over time frames appropriate to our expectations for results. NSF has established its outcome goals by determining what types of observable outcomes from its programs advance the progress of science and engineering. These include: Discoveries at and across the frontier of science and engineering; Connections between discoveries and their use in service to society; A diverse, globally oriented work force of scientists and engineers; Improved achievement in mathematics and science skills needed by all Americans; and Meaningful information on the national and international science and engineering enterprise. The first three outcome goals are most relevant to the assessment of research and are the focus of the remainder of this discussion. Assessing Progress Toward Outcome Goals Because the timing of outcomes from NSF's activities is unpredictable and annual change in the award outputs is not an accurate indicator of progress toward outcome goals, NSF has developed performance goals for outcomes against which we expect to assess progress on a continuing basis. The stream of data and information on the products of NSF's investments will be combined with the expert judgment of external panels to assess NSF's performance over time and to provide a management tool for initiating changes in direction, where needed. These continuing performance goals take advantage of GPRA's option for the use of an alternative format where quantitative annual performance goals are impossible or inappropriate. They are based on descriptive standards that convey the characteristics of the types of results NSF seeks. The successful performance standards for the outcome goals most closely related to valuing research are listed below.
OCR for page 107
--> Discoveries at and across the frontier of science and engineering—NSF is successful in making progress toward this outcome goal when, in the aggregate, NSF grantees make important discoveries; uncover new knowledge and techniques, both expected and unexpected, within and across traditional disciplinary boundaries; and forge new high-potential links across those boundaries. Connections between discoveries and their use in service to society—NSF is successful in making progress toward this outcome goal when, in the aggregate, the results of NSF-supported work are rapidly and readily available through publication and other interaction among researchers, educators, and potential users; and when new applications are based on knowledge generated by NSF grantees. Diverse, globally oriented science and engineering work force—NSF is successful in making progress toward this outcome goal when, in the aggregate, NSF programs provide a wide range of opportunities to promising investigators; expose students and scientists and engineers to world-class professional practices and increase their international experiences; strengthen the skills of the instructional work force in science and technology; ensure access to modern technologies; enhance flexibility in training to suit an increasingly broad set of roles for scientists, engineers, and technologists; when business and industry recognize the quality of students prepared for the technological work force through NSF-sponsored programs; and when the participation of underrepresented groups in NSF-sponsored projects and programs increases In addition to these successful performance standards, NSF has developed similar descriptions for exceptional performance and unacceptable performance. The descriptive standards include terms that require expert judgment, but we have attempted to limit these to concepts routinely judged through the merit review system, which gathers advice to inform project selection by program officers. Each of the descriptions will be accompanied by related output indicators that will provide hard information for the exercise of expert judgment. The descriptive performance standards will be used at several levels of aggregation and by various groups as evaluative tools in NSF's management process. We expect each program to report on its performance annually based on an internal evaluation. Senior management will examine these reports and integrate them to develop reports for NSF as a whole. This regular internal assessment cycle will be complemented and validated through external assessment using modifications to our existing Committee of Visitors process on a 3-year rolling cycle. We are already beginning to experiment with these modifications, including changes in the composition of the panels themselves. By FY 1999, all Committees of Visitors will give judgments of program effectiveness using the descriptive performance standards for outcomes. They will also address other performance issues, including those for the merit review system that these committees currently address. We anticipate that advisory committees and the National Science Board will also play important roles. A critical factor in NSF's ability to conduct these reviews will be implementation of a revised project reporting system. This is currently being tested and will be fully implemented over the next year. We must rely on the research community for complete, accurate reporting of results. Discussion Judith S. Sunley: I would like to add to Dr. Dehmer's earlier reply. The point that Dr. Manuel made is a good one. However, it is not clear to me that GPRA is a tool that really helps us deal with those questions, except to the extent that it may take it out of a 1-year view of "what do I do if my budget is cut 5 percent this year?" and put it into the context of a strategic plan that the agency has developed. This
OCR for page 107
--> plan could certainly cover the long-term implications of a 5 percent cut this year with no future cuts, or a 5 percent cut this year, another 5 percent cut next year, and another 5 percent cut the year after that. In an agency like NSF, there is frequently a feeling outside the government that, if it takes a 5 percent cut, it has that much fat in its salaries and expense pool for the people inside the agency. In an agency like NSF, our total salaries and expense budget is less than 5 percent of the agency' s budget. So any significant cuts must come directly out of investments that we would otherwise make in research and development. On the other hand, who knows (1) which 5 percent is the right 5 percent to cut out of those investments, and (2) whether some researchers could, in fact, do with less than they have actually asked for? Our program officers tend to be fiscally tightfisted in general (as many of you know). I think such cuts really do lead to a decrease in the number of people who are funded. Andrew Kaldor, Exxon Research and Development Corp.: I was impressed by both the DOE and NSF presentations. It is very satisfying to see government program managers actually use some of these tools and matrices. I have one specific concern—the second outcome given by NSF, which dealt with applications of the technology. Unless you address that outcome with a very sophisticated time-aver-aged methodology, I honestly don't see how you can measure the outcome on an annual basis. In fact, if you start using less sophisticated measurement techniques, you will rind yourself driven to short-term impact research, and I can't believe you can sustain the excellence that you are noted for under those circumstances. Judith S. Sunley: I should have been a little clearer. One of the real difficulties that NSF has is that if you look, for example, at FY 1997 and you tried to establish a performance plan for FY 1997, based on the funds we distributed in that year, to be reviewed and reported on in FY 1998, you would not rind much that came from those FY 1997 investments. So we have to go back and look at our accomplishment retrospectively, over an appropriate period of time. There will be a variety of mechanisms for "measuring" our performance in these areas, both for the discoveries and for the connections to outcomes goals. For example, we could take a look at a set of selected investments made in, say, 1990 or 1992 and see what the total output of that set of investments is in 1997. Or, we could take some key technology result from 1997 and try to trace it back and see what investments NSF made that had an impact on those technology developments. There are a variety of different ways of approaching this problem and, in the experiments that we are doing now with our Committees of Visitors, we are investigating a number of different options. Dr. Dehmer indicated that similar activities were under way in Basic Energy Sciences (BES). There is clearly not a single methodology that we would use in all cases. Andrew Kaldor: I view this part as one of the most dangerous areas in reality, because the opportunity for misunderstanding, misinterpretation, and misdirection as a result is huge. NSF, for better or worse, has under its control the best science-generating machine in the world. It is a trust that has been given to NSF and Basic Energy Sciences, and that trust is worth a tremendous amount to us as a country. So when you talk about these measurements—measurements that could take this capability and diminish, if not destroy, it—I view the process as being extremely dangerous. It is critical that NSF makes sure GPRA doesn't adversely affect its mission, which is to support the best science. Judith S. Sunley: That is something that we all are very concerned about.
OCR for page 107
--> Jack Halpern, University of Chicago: I wanted to return to the 5 percent reduction question. I think this is the crux of the matter, at least as far as the level of investment in science is concerned. This issue—how much to invest in science—is what the government struggles with each year, whether to increase or decrease the budget by a few percent. Having been through this exercise in the Academy for the last few years, of just trying to become more efficient, there are limits to what you can do. You soon reach a point where you really can't cut administrative costs much more. However, if you look then at the science support budget—what I have to say applies particularly to NSF, which is supporting extramural research and largely basic research—there are no strategic programs that you can measure against. Let's say NSF supports 20 percent of the requests that it gets. The real question is, What do we support at 21 or 19 percent? The managers at NSF are making arbitrary cutoffs, and it is not obvious what the consequences are of making a 1 percent shift from 19 to 20 or 20 to 21 percent—how detrimental the decrease, or beneficial the increase, is going to be. It seems to me that the GPRA activities that we have been hearing about provide an opportunity to calibrate how sensitive the consequences are to the cutoff. Granted, it is very difficult to identify quantitative measures of performance, but NSF has identified a particular set of criteria that they are proposing to use in assessing the effectiveness of their programs. I agree that you can't do this project by project, but if you can do it across programs, I think it would be useful to take NSF's ranking of its projects and apply these criteria to the top 20 percent and the middle cut and the bottom 10 percent. This would allow you to do a couple of things. One is to assess the effectiveness of the criteria that you are using. If they don't distinguish between what you think are your most highly ranked programs and your most poorly ranked programs, then it suggests that your criteria are not very useful. To turn that around, if you have faith in your criteria, or evidence that they allow you to assess the effectiveness of your peer-review ranking of programs, such a study would also allow you to see significant differences between your top-ranked programs and your lower-ranked ones. At the very least, it seems to me that you ought to be applying these criteria by strata, even if not to individual projects. It would also give you some appreciation about what will happen if you have to cut out the lowest 5 percent. How much less effective are they by your criteria than the others? Judith S. Sunley: That is a very interesting idea. I will try to get someone to explore some of those ideas further. Richard K. Koehn, University of Utah: I would like to expand on the point that Professor Halpern raised. Why wouldn't you include in your assessment those projects that were not funded? By selecting projects to fund and, implicitly, others not to fund, you have made choices of where to invest and where not to invest. It would be worthwhile to know that those projects you didn't invest in haven't in fact produced positive outcomes by your criteria. Judith S. Sunley: I don't think we have attempted to do that in the past. One of the things we are very concerned about is the overall burden both on the scientific community and on NSF staff, as well as the expense, in exploring these options. So we would have to estimate what value we thought we could get out of going that much farther in terms of detail in the assessments.