COSEPUP identified several key strengths, weaknesses, and other factors that influence the success of international benchmarking in evaluating research.
5.1 Some Strengths of Benchmarking
International benchmarking produces information that is, in the opinion of COSEPUP, valuable and relevant to researchers, administrators, and policy-makers. Three examples are the heavy reliance of mathematics leadership on foreign talent, the 20- to 30-year age difference between materials equipment in the US and that in several other countries, and the influence of managed care on clinical research. Although most information did not contradict prevailing views, it is unlikely that the results of each report could have been achieved as efficiently by any other technique, given the paucity of the data and information required for a traditional quantitative approach.
International benchmarking is rapid and inexpensive compared with procedures that rely entirely on the assembly of quantitative information. The use of qualitative judgments also has merit. In the words of one panel chair, the panels were able to get "80% of the value in 20% of the time" for a far lower cost.
5.2 Some Weaknesses of Benchmarking
In retrospect, the experiments revealed several methodologic weaknesses that can be addressed in future benchmarking activities. For example, non-US members should be included in the oversight group that selects the panels. The same features that make a virtual-congress
approach effective also expose such weaknesses as the potential for a bias that depends on the citizenship of the panelists who gather data for analysis. This increase the importance of including substantial proportions of non-US participants in all panels.
Multidisciplinary fields like materials science and engineering and immunology pose special challenges. For example, the immunology panel had to extract data from collaborative and international research; had to compare large enterprises with multiple smaller ones; and had to extract information on the specific field of immunology from related research fields in large, aggregate databases.
5.3 Other Observations about Benchmarking
The method by which the most important fields and subfields are identified is critical. For example, immunology is not considered a "discipline" in the traditional sense and does not have departmental status in most universities. The selection of subfields is a somewhat subjective process that might differ between one benchmarking exercise. Rather than being a drawback, however, such differences will reflect the continual shifting of the borders of modern fields. A field should be considered by the array of related domains between which investigators can move without leaving the realm of their expertise.
It is likely that benchmarking could be effective on a 3- or 5-year cycle because large fields of research change relatively slowly. Annual benchmarking probably would not be sensitive enough to reveal changes.
Our series of experiments has revealed that no benchmarking technique is sufficient by itself and that the utility of particular techniques varies by field. Therefore, each panel should use a variety of comparable qualitative and quantitative methods to afford cross-verification of results. The methods should be kept as independent as possible.
Because the accuracy of benchmarking depends heavily on panel members' personal knowledge of fields, panel members were more closely involved with the writing of the report than is frequently the case with committee-written reports.
Use of indicators that provide information on degree of uncertainty and reliability might enhance the presentation of the panel assessments of leadership status.
The extensive use of benchmarking would be enhanced by reliable, up-to-date information. The US field-specific data that are collected do not provide sufficient or timely information; non-US data are even more problematic.
A finding that the United States is the world leader in a research field might lead some to conclude that additional resources for that field are not warranted. This might or might not be the case. For example,
the mathematics report indicates that the United States is the world leader in mathematics. If the mathematics community requests additional resources, some policymakers might question the request on the grounds that the United States is already the world leader in that field. That concern has been expressed in connection with the life sciences. However, as the mathematics panel indicated in its report, the United States could drop from being ''the world leader" in mathematics research unless additional investments were made in some key subfields and unless more US students chose to enter the field. Thus, an assessment that the United States is the leader in a field does not necessarily imply that no additional resources are needed for the field.