In the past, NSF has developed ways of discerning the wants of at least one group of customers, the scientific community and, perhaps, of industry and other constituents. But do these methods still work? Do they work well enough, and are they weighted correctly in NSF 's considerations? Such questions again raise the issue of cultural change. How is the advice that comes in through NSF's academic advisory committees weighted compared to the advice that may come in from industry, from Congress, or from other groups? It may be that, as a part of acknowledging a need for cultural change, the weighting of those sources of advice needs to be reexamined.
The fourth set of issues has to do with some points about measurement methodology. First, it was mentioned several times during the workshop that qualitative as well as quantitative measures should be considered, and that there should be a mix of the two types. Indeed, the general view was that those aspects of research performance that can be measured quantitatively are likely to be trivial.
There was some discussion as well about the behavioral impacts of measurements. If you tell people that they are going to be evaluated on the basis of some kind of measure, they will work to that measure. There is no doubt about it. Perhaps that is good, but maybe not. The point is that it is essential to understand the behavioral consequences of the measurements that are being used.
As was mentioned several times in the workshop, it would be desirable to use benchmarking. This was a major recommendation of a recent study of the National Academy of Sciences' Committee on Science, Engineering, and Public Policy (COSEPUP; Science, Technology, and the Federal Government: National Goals for a New Era, National Academy Press, Washington, D.C., 1994), which stated that the United States should be among the world's leaders in all areas of science and clearly leading in certain fields of science. Also, more attention should be paid to results, rather than to proposals. NSF should look at trends, as well as at absolute levels in the measurements. Finally, the measurements should relate to agency decision making, so that the metrics can be used as a means of explaining or making transparent the decisions in the allocation of resources.
The fifth issue concerns what can be measured and what it makes sense to measure. The workshop participants did not get very far with that issue, but a number of suggestions should be considered. The first set has to do with what the workshop participants agreed is most important, the education of scientists and engineers. Many aspects of human capital formation can be measured. NSF has done it for several decades. Its reports are full of statistics on numbers of people in different categories, but some aspects are not that clear in the data that NSF produces. One of these gray areas concerns where people trained as scientists and engineers go, what disciplines they become involved in, and what types of work they do. The purpose of such tracking is to ascertain the contribution to society in general— not just to the performance of academic research—of the training of the graduates of the programs that NSF funds. Another point for consideration is the flexibility or adaptability of these people. If you cannot retrain an electrical engineer to be a radio frequency engineer in six months, then something probably is wrong with the educational system. This deficiency may be difficult to measure, but it is something that should be considered.
Research, in some ways, is more difficult to assess. Many measures have been used—publications, citation counts, patents, and a number of other metrics suggested in this report. These may or may not be relevant to performance, in terms of NSF goals. One of the key tasks is to sort them out and identify the measures that are relevant and the ones that are not. There also should not be too many such measures. One participant suggested that the total perhaps should not exceed six, although it is likely that NSF will end up using a lot more than six.
The NSF should place greater emphasis on the contribution of programs to what the “customers” want, on outcomes rather than mere outputs. It is possible to evaluate the impact of research in broad fields, for example, condensed-matter physics. If one examines research in condensed-matter physics and considers the industrial applications that have relied on it, the overall magnitude of the impact will become apparent quickly. One can argue, however, that scientific research is a necessary condition for success in many fields, but not a sufficient condition. In the course of events occurring between a scientific discovery and its actual application and practice, an enormous number of other factors enter into the process—design, marketing, manufacturing, distribution, and so forth. That is one of the reasons economists have such trouble trying to measure the impact, in economic terms, of basic research.
It also is useful to describe the breadth of applicability of each field. This approach was described in the workshop when it was pointed out that Xerox has tried to look at all of the different business divisions in which research results are used. This is something else that could be done by NSF and others to better justify support for research.