In general, COSEPUP judged that the benchmarking experiments were successful. The committee agreed that the technique of benchmarking was able to provide responses on each of the three topics listed in section 2.2 (the relative position of US research today, the relative position of US research in the future, and the key factors influencing relative US performance).
At the outset, some committee members and policy leaders doubted that complex fields of modern science could be assessed to any degree of accuracy without relatively large investments of money and time. However, all three experiments were concluded in a year or less for relatively modest investments.
Panelists found that it is possible to take a "snapshot" of a field (to conduct a leadership survey) by means of a virtual congress in a matter of weeks. This implies that the method shows promise and that further experiments to optimize scale and technique are justified.
COSEPUP found good correlation among the findings produced by various indicators. For example, the qualitative judgments elicited by the virtual congress were similar to the results of quantitative indicators, such as publications cited and papers delivered at international congresses.
In this section, COSEPUP reports its findings with respect to the objectives of the experiments, the composition of panels, the methods used, and the likely utility of benchmarking for federal agencies and policy-makers in the executive branch and Congress. These findings are based on the committee's benchmarking experiments, the comments of reviewers, and the observations of workshop participants (see appendix C for a summary of the workshop).
4.1 Findings About Objectives
When the benchmarking studies were proposed, it was not clear whether a panel of experts in a particular field could analyze that field in an objective fashion. By "objective", the committee means "a reasonably balanced view of US research compared with that of the rest of the world." In spite of that concern, COSEPUP has concluded that each panel has been able to produce a reasonably objective report. An important goal of the benchmarking program was to conduct the studies at modest expense within a short period. Given the large number of fields in science and engineering, benchmarking would not be practical if it were expensive or time-consuming. The three experiments were completed in 6-8 months at a cost of $50,000 each. (A key factor in the low cost is that all panelists agreed to be pro bono contributors.)
4.2 Findings About Results
The studies succeeded in identifying key factors that influenced the status of fields. For example, in mathematics, human resources— particularly the reliance on foreign talent—was identified as a key issue. The panel noted that current US leadership depends substantially on temporary waves of immigrants from Europe (notably from the former Soviet Union) and Asia that cannot be counted on to continue. In addition, the panel warned that the quality of US research could be affected in the future by the observed falloff in numbers of American students pursuing graduate-level mathematics.
Using information from the Department of Energy and the National Institute of Standards and Technology, the materials science and engineering panel was able to identify facilities and infrastructure as the keys to research leadership. In the United States, some materials research facilities, many of which were built in the 1960s, are deteriorating. In Europe and Japan, facilities tend to be more modern.
In immunology, the three tools for evaluating US research (reputation survey, citation analysis, and journal-publication analysis) were distinct and had different strengths and flaws, but they led to basically the same conclusion: the names of US researchers appeared between 2-3 times as often as the names of non-US researchers. The immunology panel was able to identify an important concern that arises from shifts in the US health-care system. The new emphasis on managed care means that fewer patients are available to academic institutions for clinical trials in immunology. The US health-care system differs from that of many European countries, where the centralized medical system provides an abundance of such patients.
4.3 Findings About the Methodology of Benchmarking
COSEPUP found that each panel developed its conclusions by using a similar set of methods, with some variation to match field-specific
differences. All panels used the leadership survey, or virtual congress, in which panel members asked colleagues in the United States and abroad whom they would choose to speak at international conferences in particular subfields. Panel members found this method to be the most efficient and credible way to evaluate their fields. People chosen to name the keynote speakers in the virtual congresses tend to be those who have generated the central ideas in their fields and so are in the best position to understand the relative contributions of different scientists and countries. These experts are also in a position to interpret the validity of the quantitative measurements used for benchmarking.
Although the virtual congress does not constitute a systematic assessment and is somewhat subjective, it is the same approach used by leaders of a field to organize real conferences featuring the "best of the best". In conducting this analysis, panel members felt secure in relying not only on their own judgment but also on the judgment of their colleagues and on the collective wisdom of the field. Another advantage of the virtual congress is its swiftness: results are available soon after the experts are polled. One way to test the soundness of this method would be to compare the outcomes of two surveys done in two countries; the outcomes should be the same or very similar. The exercise would also test the likelihood that a country holding the survey is biased in favor of its own researchers.
An interesting overall comparison is available because two independent tests were done in mathematics—one by the COSEPUP mathematics benchmarking panel and another by an NSF panel. The panels were different in composition and in their charges (one was instructed to produce recommendations, and the other was asked not to), but they developed similar sets of conclusions regarding the overall stature of US mathematics and each of its subfields.
Journal analysis proved useful, but its value was mitigated by the amount of time required to analyze information and the need for analysts who were knowledgeable about the field on a worldwide basis. An additional option, not undertaken because of cost constraints, would have been a search of citations in US patent literature for scientific background and prior art.
The use of quantitative measures generally is hampered by the scant availability of comparable international information. The international information that is available is field-dependent and not very timely, and there are variations in the delineation of fields that make comparisons difficult.
4.4 Findings About the Membership of Panels
COSEPUP found that the use of panels is effective when panel members and the experts whom they consult are the most respected innovators in their fields. As leaders, they are in a unique position to
understand current developments and trends. The committee also found that the geographic diversity and professional diversity of panel membership are essential to ensure a fair and comprehensive assessment. Over the course of the studies, panels came to agree that no more than half their members should be US academic researchers. The immunology panel, for example, found in its initial response a clear bias related to the laboratory location of the pollees: US-based investigators routinely named a higher percentage of Americans than did non-US-based investigators. The nationality of the poller also appeared to have an influence: the three non-US pollers often obtained a virtual-congress list with a higher proportion of non-US speakers. The panel decided in its second iteration that it needed to increase foreign representation to ensure objectivity; on doing so, it obtained results that agree more closely with those of citation analysis and journal-publication analysis and with the judgment of the panelists.
The committee concluded that at least one-third of panel members should be non-US researchers. An additional one-third should be a combination of researchers in industry and in related fields who use the results of research. In the experience of the panels, that mix of perspectives, including especially the representatives of research-intensive industries (such as biotechnology, telecommunications, and aerospace), was essential for understanding not only the scholarly and technical achievements of researchers, but also the broader importance of those achievements to social and economic objectives.
4.5 Findings About the Utility of Benchmarking
On the basis of presentations made at the workshop by congressional and agency staff and feedback from disciplinary-society members who were briefed by panelists, COSEPUP found that benchmarking is potentially useful to the research communities in selected fields and in fields related to them, and to the government sponsors of research in selected and related fields. The panel reports were able to identify weaknesses in particular subfields and sub-subfields and to point out issues that need to be addressed in making policy.
The committee also suggests that benchmarking might be useful in efforts to comply with GPRA. Representatives of federal agencies that support research were asked during the benchmarking workshop about the utility of the technique. Although the terminology and, to some extent, the concept were new to some, there were some indications that benchmarking was likely to be useful in evaluating agency research programs and in providing information that would help them to comply with GPRA.