Click for next page ( 42


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 41
PRODUCTIVITY Helen Hofer Gee* INTRODUCTION In 1986 NIH contracted with the Institute of Medicine to organize a conference on research training. A central, though not explicitly stated, purpose of the conference was to obtain guidance on how to continue to meet Congressionally mandated requirements for periodic reports on the role of and need for research training in the biomedical and behavioral sciences. Ostensibly, the conference was concerned with an examination of how successfully research training has been conducted, which program mechanisms produce the most suitable training, and what information is required to enable further assessments of national needs for researchers in the decade ahead. blister first among three categories of issues requiring attention was] . . . measuring the productivity of scientists in their research programs and as reflections of their training. The issue in productivity is how to improve the measurement of it; simply gauging productivity by the current popular methods is inadequate for the task at hand. (Institute of Medicine, 1986) Anyone who has ever been faced with the task of having to select among individuals--for employment, advancement, funding, awards--has dealt with the issue of assessing productivity and has, implicitly or explicitly, weighed available evidence of previous performance. The difficulty and complexity of these decisions may well underlie the malaise that is apparent in the committee report. The committee's more explicitly reported concerns with such measures as success in obtaining research grants, citation counts that ignore differences among and possibly within disciplines, and studies that fail to consider work environments suggest that the real problem lies not in the measures of productivity per se that have been used, but in how the measures have been used--that is, in the designs of assessments of training support programs. * The opinions expressed in this paper are the author's and do not necessarily reflect those of either the Committee on Biomedical and Behavioral Research Personnel or the National Research Council.

OCR for page 41
Unfortunately for those who seek quick solutions, concepts relevant to the measurement of productivity are inextricable from those concerning almost al, other domains within the social study of science. Dealing directly with the problems of productivity measurement therefore requires cognizance of the state of the entire science. Any study, for example, that ignores differences among or within disciplines ignores more than three decades of intensive study of the entire social structure of science, not just the study of "productivity" per se. In the critical, scholarly essay, Gilbert (1978) noted that there is a reciprocal relationship between the theoretical framework which the social scientist brings to his work and the indicators which he will find most appropriate for his research . . . the adequacy of an indicator can only be assessed through a detailed study of the context in which the phenomena to be measured are embedded, and of the validity of the measurement theory on which it relies . . . this requirement is equivalent to the demand that we understand the functioning of the scientific community at a micro- level. A sma11 community of scientists (static in size in the United States since about 1980, but rapidly increasing in Western and Eastern Europe and Japan since the late 1970s) has been making significant progress in the direction Gilbert suggests (see Appendix and References). The most recent burst of research activity relevant to the assessment of productivity began when Martin and Irvine (1983) assessed basic research activity and programs (radio astronomy and physical sciences). Their papers specified "partial indicators" of scientific progress and investigated the extent to which these indicators '"converge" to produce valid and reliable estimates of the productivity of designated groups of scientists. The work created a virtual storm of criticism, largely because it was so far-reaching (see Chubin, 1987~. The continuing discussion has instilled new vigor into the development of the field. The concept of multiple partial criteria was certainly not introduced by Irvine and Martin. Even Clark's study of the careers of psychologists (1957) incorporated the concept in a general sense. As noted by Jones et al. (1982), Weiss (1972) discussed them: At best each is a partial measure encompassing a fraction of the large concept . . . . Moreover, each measure [may contain] a load of irrelevant superfluities, "extra baggage" unrelated to the outcomes under study. By the use of a number of such measures, each contributing a different facet of information, we can limit the effect of irrelevancies 42

OCR for page 41
and develop a more rounded and truer picture of program outcomes. However, as Chubin (1987) concludes in his discussion of Irvine and Martin's work, it is . . . also politically astute, serving scholarly and policy communities. It explicitly anticipates criticism and sources of error, disarms skeptics, and gets an analytical foot in the right doors--those shielding the offices of policymakers who have come to rely on participant scientists and their own imprecise and self-serving devices for making decisions about who gets and who doesn't. Moravcsik (1986) hailed the extensive debate and critiques of the Martin and Irvine work as a welcome sign-- since it shows that the field has reached the state of maturity when its applications to concrete situations are sufficiently realistic to create a heated controversy, involving people from a variety of professional backgrounds. He commented further that neither critics nor Irvine and Martin, in their response to critics, offered any specific suggestions for improvement. Moravcsik then proposed that some suggestions can be made and conclusions drawn concerning the need for future activities by relating the debate to another effort in science assessment--namely, a project organized by the United Nations Center for Science and Technology for Development (UNCSTD) and centered on a paper that Moravcsik was commissioned to write. Moravcsik reported further that at a meeting held in Graz, Austria, in May 1984 the paper was discussed: The UNCSTD project did not result in a set recipe for assessing science and technology. On the contrary, the project concluded that there is no such universal recipe, and hence that the aim should be to devise a process which, in any particular case, yields a methodology for an assessment. SUGGESTED OUTLINE FOR PLANNING STUDIES OF PRODUCTIVITY OR QUALITY The proposed UNCSD process serves as a useful framework in which to present some thoughts about planning studies focusing on the assessment of productivity. The following list draws heavily and directly on Moravcsik's report: 43

OCR for page 41
1. Identifv the coals of science that _ __, _ ~ ~ _ are to be taken into account. Moravcsik noted, . . . Science and technology have many different goals, aim and justifications, and in any particular case it must be specified which one (or which ones) of these are taken into account, and with what weight. Studies of National Institutes of Health research training programs have ostensibly aimed at assessing a common goal of such programs--to wit, the production of trained scientists who will contribute to the advancement of the biomedical sciences. Prior to the mid 1970s, this was interpreted by some Institutes as including support for the clinical training of physicians In areas where the supply of expertise was felt to be inadequate. After it became clear that the majority of these individuals simply entered private practice, however, those programs were for the most part discontinued. Such discrepancies must, therefore, be given careful attention in planning studies of program outcomes. Since the mid '70s, NIH training programs in general have focused specifically and exclusively on research training. Assessment of the success of these research training programs have, however, interpreted the terms ~t contribute" and "advancement" quite narrowly. Teaching (either future researchers or practitioners), biomedical research administration, mentoring (i.e., guiding the graduate education of future researchers), and conducting research that does not seek external funding and research that cannot (because of the interests or concerns of the power structure within which it is conducted) be published have often been denied recognition as goal-relevant behavior. Consideration should be given to whether any of these professional activities should be explicitly recognized as contributing to the advancement of the biomedical sciences and, if so, studies should be designed to assess these kinds of productive endeavor. Recognize the multidimensionality of goals of potential pathways to them, and of methods of measuring outcomes: specify which dimensions and connections of the system are to be taken into account. Once goals have been specified (and it is recognized that achieving those goals can and is likely to be expressed in different ways), study designs must allow for the varieties of pathways and outcomes that may occur. Cole and Cole (1973) set the stage for this type of inquiry in their cross-sectiona~ analyses. The work of Long, McGinnis, and Allison (1979, 1981, 1982) examined many of the same 'connections" as the Coles but, by following a cohort longitudinally, revealed a different sequence of career development. The Long and McGinnis work has been particularly notable in its pursuit of the significance of context, the multidimensionality of career pathways, and the 44

OCR for page 41
changing significance of predictors in assessing productivity at stages in research career development. In another notable analysis of the NIH Research Career Development Program, Carter et al. (1987) examined both selection processes and outcomes using multivariate techniques to assess the significance of correlates and causal relations, as well as a sophisticated cohort selection procedure to control for disciplinary differences. 3. If. as is usually the case, it is not feasible to study all aspects of a system, specify which aspects are to be included and which will be omitted and indicate clearly the implications of these decisions for the assessment process. Moravscsik provided an apposite illustration of one perspective: If, of two cars, one has a higher top speed, and the other a lower gasoline consumption per mile, it is not possible to say which is the 'better' car without ascribing some value judgment to high speeds versus economy in the use of fuel. Two other examples come to mind: (1) if in planning a study of the effectiveness of a training program, it was decided that pursuit of a research career in the private sector was a favorable outcome but that assessing the performance of former trainees who followed that path was not feasible, they could be explicitly excluded from potential comparison cohorts; (2) if research administration is deemed a favorable outcome, those research administrators could be excluded from comparisons in which research publications were used as indicators and included where other measures of productivity, more suitable to their employment, were used. The guideline simply demands precise specification of the details of the design of an investigation. 4. Specify how the results of the assessment are to be used. A study intended to assist program managers in their decisionmaking will seldom have the same design requirements as a study intended to inform policy decisions. If policy decisionmakers are to be informed, for example, the delineation of possible alternative indicators of productivity may be critical, whereas meeting program management needs may require more intensive analysis of only those that are the most direct manifestations of program goals. The key is to consider carefully the kinds of decisions that the study is intended to influence. 45

OCR for page 41
Select a set of indicators that will satisfy the l requirements of each of the study design considerations. Recognize and specify the limitations of each of the indicators. To quote Moravscik, There are many types of indicators: input versus output; quantitative versus qualitative, indicators of activity, productivity, or progress; indicators of quality, importance or impact; there are functional and instrumental indicators; there are micro- and macro- indicators; there are "data-bases" and '"perceptual" indicators; and so on. Some indicators are already 'on the shelf' and can be taken from it and used in new situations. More likely, however, the most appropriate indicators for a new situation need to be improvised for that particular situation. . . . Be reconciled to the fact that in any case, you will end up with a set of indicator measurements which, in general, cannot be reduced to a one-dimensional measure and hence to an unambiguous ranking. It is apparent that the selection and/or development of indicators of productivity depend on the kinds of questions that are being asked and the perceived complexity of the system involved. An indicator that provides excellent explanatory data for one study may be useless in another context. Every measure, moreover, has limitations that may, under some conditions, obviate their utility and, in other circumstances, may be totally irrelevant. If a study plan is suitably mapped, it may not be feasible to use the same indicators of productivity for all individuals in a cohort. For example, if teaching undergraduate students is judged to be an acceptable outcome of research training, the productivity of an individual whose primary activity is teaching will not be appropriately assessed by counting that individual's production of research papers--but consideration might be given to using the production of review papers as one of several measures of performance in the educational domain. However, for some outcomes regarded as suitable expressions of the goals of an enterprise, no suitable approach to assessment "measurement" is available to evaluators. In such cases the individuals should be removed from comparison groups that are to be analyzed statistically rather than, as is often the case, counted a "failure" according to indicators that appropriately measure the productivity of other members of the group. MEASURES OF PRODUCTIVITY The above overview should make it clear that any discussion of specific measures of productivity is necessarily superficial, simplistic, and incomplete because outside the context of the 46

OCR for page 41
design of a specific study, there is not a great deal to be said about any particular measure. In addition, since productivity in one sense or another is the focus of most of the studies of the social science of science, a thorough literature review would require a few years of effort. Nonetheless, various measures that might be used in studies of productivity are discussed below. The discussion is intended to draw attention to complexities, issues, and problems in the use of these measures, knowledge of which might aid in carrying out the kind of careful approach to study design outlined earlier. Publication Counts While it is generally agreed that the principal, or most prevalent, immediate outcome of the active research investigator's efforts is the preparation of papers published in professional journals and by 1982 a nearly 2600-item bibliography listing publications analysis items was available (Hjerppe, 1982), counts of publications continue to be derogated. Because the analysis of publications plays a dominant role in social studies of science, a complex, highly sophisticated methodology has been developed. The intellectual leader of modern-day social studies of science was Derek J. deSolla Price (1961, 1963) e The early development of computer-based analytic methods, which have stimulated much of the sophisticated analysis characteristic of social analysis of science studies of the past two decades, resulted largely from the enterprise of two individuals: Eugene Garfield (1955) developed the Science Citation Index (SCI), on which most publications analysis work is dependent; Francis Narin and his associates at Computer Horizons took the lead in exploring and developing measures to maximize the utility of the wealth of information contained in the SCI. In 1969 (Narin, 1977) the area even acquired its own label--bibliometrics--to describe collectively quantitative, analytical studies of written e communication. In simplest terms, publication counts are no longer acceptable as a measure of productivity unless at least the following potential sources of error or misinterpretation are controlled or accounted for: o differences among disciplines of cohort members, o differences among journals in terms of measured influence (see section on journals page 131), o o differences in "quality" or "impact" as measured by citations or peer assessment (or journal influence), professional age of cohort members, and 47

OCR for page 41
o social context of cohort members. Despite concerns about "loud noises from empty vessels," publication counts have been shown repeatedly to correlate positively with assessments of quality and to contribute useful independent variance to the assessment of productivity. Reported correlations between quantity and quality measures vary considerably among studies, between approximately r =.23 and r =.80; differences may relate to disciplines, characteristics of cohorts, or even to how quantity and quality of publications are measured. In a series of studies conducted in the late 1970s (see Narin, 1983), numbers of publications by faculty and staff in universities and hospitals were shown to be extremely highly correlated with NIH funding (r = .90 to 95~; and there were no economies or diseconomies of scale in the funding of research grants. Funding and publication relationships may appear to break down, however, when small aggregates of researchers or disciplines are assessed and especially when basic and clinical research publications are intermixed. Publication rates of basic scientists differ markedly from those of clinical scientists, who publish less frequently and whose research is usually very much more costly. then the funding and publication rates of small aggregates of subjects are investigated, the tendency is to ignore such disciplinary differences, thus ignoring an important moderating variable. With small aggregates other minor sources of error--such as idiosyncratic events that may affect the usual patterns of behavior of part of a group for a period of time-- may also obscure an underlying relationship. When large aggregates and adequate time spans are employed, such obfuscating sources of error will usually cancel each other out, permitting stable, underlying relations to be revealed. When a quick, inexpensive estimate of productivity is needed, large quantities of data are available, and the comparability of cohorts is established, a simple count of publications may well provide adequate information. Ordinarily, however, such a single measure is useful primarily as a means of setting the stage for a more comprehensive investigation of some aspect of science or scientific behavior. Weighted Counts: The use of weighted counts of papers permits obtaining a preliminary estimate of quality without waiting for citations to become available; it is also an inexpensive means of obtaining an estimate of quality for large numbers of papers. 48

OCR for page 41
Each paper is weighted by an influence weight assigned to the journal in which the paper appears.) Paper Counts in the "Best" Journals: Committees charged with evaluating group or individual scientific performance will sometimes request that publications be counted only in a selection of the ''best'' journals. Such a practice would be seriously inequitable, since scientists do not have equal access to journals. For example, those located in smaller institutions are more often published in less influential journals, as are younger, less well-established investigators; and regional differences abound in some disciplines. McAllister and Narin (1983) investigated these relationships in the publications of all U.S. medical schools, using average citation influence per paper measures: the average citation influence per paper increased with the total number of biomedical publications, even when institutional control (public and private), region, and areas of research emphasis were control led. The positive relation between number of papers and citation influence was shown to hold within disciplines (biochemistry and internal medicine were analyzed in detail) and within research "level" (i.e., along basic and clinical research dimensions). Data Bases: NIH-supported studies that have involved counts of published scientific papers have almost always depended on computerized data bases derived by CHI from Medline and the SCI. The source data bases require a great deal of preliminary massaging to consolidate information and correct inconsistencies; but once prepared, they make data available unobtrusively, make accessible several different quantitative measures of publication performance, avoid the increasingly restrictive problem of securing clearance from the Office of Management and Budget (involved, in studies of federal programs, in any attempt to go directly to the scientific community for information), and are more accurate than individual reports. An interesting departure from the use exclusively of the comprehensive data base was reported by V. L. Simeon et al. (1986), who had studied a large research institution in Yugoslavia. In their study several forms of publication and communication were employed in addition to SCI journals (e.g., papers in other scientific journals and congress proceedings, books and monographs, technical articles in 1The technique developed by Computer Horizons, Inc. (CHI), determines journal influence weights by the weighted number of citations each journal receives over a given period of time. See F . Narin , Evaluative Bib ~ iometrics: The Use of Pub ~ ication and Ci tation Ana ~ ysis in the Eva 7 nation of Scientific Activi ty (Report to the National Science Foundation), 1976; and F. Narin, G. Pinski, and H. H. Gee, "Structure of the Biomedical Literature," Journal of the American Society for Information Science 27:25-45, 1976. 49

OCR for page 41
encyclopedias and popularizations, and presentations at scientific meetings). ~ multivariate analysis revealed interesting patterns of change among the several variables over time. This rather preliminary study, which was focused on change in publication behavior following the introduction of minimal criteria for promotion warranted no conclusions; but it suggested to this writer the possibility that some measures of these types might be useful in considering criteria suitable for assessing the productivity of individuals whose careers, though academic, are not directly focused on the production of original research. Activity Indexes: In recent years the utility of a new approach to using publication counts, the "activity index," has been demonstrated, particularly in studies conducted by CHI for NIH e Activity indexes are ratios that make use of publication counts in a relational context, thus allowing comparisons to be made among groups while allowing each group to be described within its own context. 2 Describing NIH Institutes' relative investment in the support of research in different disciplines is a case in point (see Gee and Narin, 1986~. Journal papers are more readily and accurately assigned to disciplines than are dollars, and a ratio that describes an Institute's investment in a discipline relative to both the Institute's investment in all disciplines and the "size" of the discipline among all others in a data set provides a great deal of information for comparison among both disciplines and Institutes. Schubert and Braun (1986) suggest several additional types of indexes that might be useful for different purposes. CITATIONS Ever since Clark's study of psychologists (1957), citation counts have been a favored measure for the assessment of productivity. In most cases, citations ad one or in combination with publication counts are more closely correlated with subjective estimates of productivity than are any other measures. They are more universally applicable to the assessment of scientific research activity than are other measures because (1) publication is the most accessible means of expression available to all scientists, and (2) being published offers a broader audience to the scientist than any other medium. 2The percent of an organization's papers that are published in a given discipline is divided by the percent of all papers in the data set that represent that discipline. An index of "1.0" indicates that the level of publication activity of this group in this discipline is consonant with the level that discipline represents among all disciplines. 50

OCR for page 41
mechanism: Rather than referring to citations as measures of "quality, as was common in the 1970s, the current practice is to refer to them as measures of "impact" or "utilization" or "influence." The implication is that before citations can be referred to as measures of the "quality" of research, the issue should be investigated in the given context of definition. 11 From an entirely different perspective! Moravcsik (1986), Chubin (1987), Cronin (1984), Vinkler (1987), and others have discussed and/or analyzed the functions and meaning of citations in terms of author motivation. Vinkler, whose contribution is most recent, has provided a concise review of the literature concerning definitions, classification, and roles that citations play in the scientific literature, concluding (in concert with Cronin) that the information carrier role is the most important. Vinkler distinguishes between "professional" (work is based on the cited work or uses part of it) and 'connectional" (e.g., desire of an author to establish a connection with the cited author or work) reasons for citation. In Vinkler's study, a group of productive investigators rated each of the references they had listed in a selected recent paper, identifying which of eight professional and/or nine connectional reasons had motivated the decision to cite, and the strength of the motive. Most (81 percent) citations were made solely for professional reasons-- that is, in a literature review for "completeness" or because the current work was based at least in part on the cited work, the cited work confirms or supports the work in the citing paper, or the cited work is criticized or refuted (at one of three levels). Citations made partially for professional and partially for connectional reasons accounted for 17 percent; only two percent were made solely for connectional reasons. It was also found that two to three times as many papers are reviewed as are actually cited. Failure to cite was also investigated; the principal reason found was that a work was not considered important enough to the current effort to warrant citation. Second most important was the "obliteration" phenomenon--the origin so well known that citation was not needed. A citation threshold model has been developed, and data confirm that the threshold depends primarily on the professional relevance of the work potentially citable in a given paper. Narin (1976), considered citations as an assessment Citation counts may be used directly as a measure of the utilization or influence of a single publication or of all the publications of an individual, a grant, contract, department, university, funding agency or country. Citation counts may be used to link individuals, institutions, and programs, since they show how one publication relates to another. . . In addition to these evaluative uses, citations also have 51

OCR for page 41
patenting behavior within firms. Systematic sample survey data are required on the following subjects: o the sources of the innovative activities that lead to patenting in particular, the intersectoral variance in the relative importance of R&D, production engineering small firms, and other sources; o the time distribution-of patenting activities over the life cycle of an innovation (in particular, does patenting typically reach a maximum at the time of commercial launch?; o the propensity to patent the results of innovative activities: in particular, sector specific factors related to the effectiveness of patenting as a barrier to imitation, compared to alternatives; firm-specific factors related to perceptions of the costs and benefits of patenting; and country-specific factors redating to the costs and benefits of patenting; and o the judgment of technological peers on the innovative performance of specific firms and countries, and on the relative rate of technological advance in specific fields: in particular, the degree to which these judgments are consistent with the patterns shown by patent statistics. Finally, Pavitt calls for improved classification schemes, such that established patent classes can be matched more effectively, on the one hand to standard industrial and trade classifications and, on the other, to technically coherent fields of development. SUMMARY There are, simply, no easy, ready-made solutions to the problems of identifying measures that will be useful in the assessment of productivity. There is need for the development and application of creative approaches to improving the utility of the kinds of information that can be obtained. The development, for example, of indexes that may increase the equitability of some measures. And there is need as well, in many cases, for increased attention to detail in designing studies and analyzing data. 64

OCR for page 41
The two sources of information that have the broadest potential value in the assessment of academic scientific performance are peer assessment and the analysis of publications, though there are circumstances in which neither may be appropriate. (For analyses involving the commercial sector, patent analysis--when used as an extension of publication analysis--should probably be added.) From the perspective that they tend to be fairly highly correlated, each contributes somewhat to confidence in the other, and to the extent that they are not correlated the need for both kinds of information is greater in the given measurement situation. Because peer assessment is so extremely costly, time consuming, and difficult to employ equitably, it may be necessary or worthwhile, especially in large-scale studies, to investigate whether there are records available about--for example, program operation, faculty activity, support, student outcomes, and resources (in addition to publication data)--that might be able to account for a large proportion of the variation in peer judgments of, program quality. On the other hand, the use of publication and citation measures as the sole consideration in the assessment of the individual scientist's productivity can be rejected on a purely rational basis. As a means of confirming a positive subjective judgment of individual performance, there is no problem, but the opposite does not hold because there are myriad alternative explanations for low numbers of publications and for few or no citations. One of the more significant misjudgments that can result is the case in which few or no citations are received by highly significant papers that either are ahead of their time or are published in obscure journals. No imperfect too] that may be used to the disadvantage of the single individual (including peer judgment) can be justified. The caution warrants repeating (and appears fairly frequently in the bibliometric literature) that bibliometric measures are most appropriately employed in group comparisons in which aggregates of publications are Jarge--just how large depends on how closely comparison groups can be matched. Correspondingly, peer assessments are most appropriately employed when peers are equally informed about all of the assessment targets and when self-serving competitive interests are absent. Perhaps the single most important factor in planning investigations of productivity is the need to employ multiple measures and to apply them selectively to the appropriate targets. 65

OCR for page 41
REFERENCES Anderson, Richard C., Francis Narin, and Paul McAllister. 1978. Publication ratings versus peer ratings of universities. Journa ~ of the American Society for Information Science 29~2~:91-103. Carter, Grace M. 1974. Peer Review, Citations, and Biomedical Research Policy: NIH Grants to Medical School Faculty (Rand Report R-1583-HEW). Washington, D.C.: Rand Corporation. , Clara S. Lai, Carolyn L. Lee. 1978. A Comparison of Large Grants and Research Project Grants Awarded by the National Institutes of Health (Rand Report R-2228-1-NIH). Washington, D.C.: Rand Corporation. , John D. Winkler, and Andrea K. Biddle. 1987. An Evaluation of the NIH Research Career Development Award. Washington, D.C.: Rand Corporation. Chubin, Daryl E. 1987. Research evaluation and the generation of big science policy. Knowledge 9 (2) :254-277. , and Soumyo D. Moitra. 1975. Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science 5:423-441. Clark, Kenneth E. 1957. America's Psychologists: A Survey of a Growing Profession. Washington, D.C.: American Psychological Association, Washington, D.C. Cole, Jonathan R., and Stephen Cole. 1973. Social Stratification in Science. Chicago: University of Chicago Press. Committee on Science, Engineering, and Public Policy (COSEPUP) 1982. The Quality of Research in Science Methods for Postperformance Evaluation of Research in the National Science Foundation. Washington, D.C.: National Academy Press. Cronin, B. 1984. The Citation Process. London: Taylor Graham. Fox, Mary Frank. 1983. Scientists' publication productivity Social Studies of Science 13~2~:298-329. Garfield, 167. E. 1955. Citation indexes for science. Science . 1972. Citation analysis as a tool in journal evaluation. Science 178:471-479. 66

OCR for page 41
Gee, Helen Hofer. 1988. An Analysis of NIH Intramural Research Publications, 1973-1984 (Report to the Committee to Study Strategies to Strengthen the Scientific Excellence of the NIH Intramural Research Program). Academy Press. Washington, D.C.: National , and Frances Narin. 1986. An Analysis of Research Publications Supported by NIH 1973-76 ant] 1977-80 (Publication No. 86-2777), Washington, D.C.: NIH. Gilbert, G. Nigel. 1988. Measuring the growth of science. Scientometrics 1~1~:9-34. Hjerppe, R. 1982. Supplement to a "Bibliography of bibliometrics and citation indexing & analysis." Scientometrics 4 ~ 3 ~ : 2 4 1-2 7 3 . Institute for Scientific Information. 1963. Science Citation Index. Philadelphia, PA: ISI. Jones, Lyle V., Gardner Lin~zey, and Porter Coggeshall (eds.~. 1982. An Assessment of Research-Doctorate Programs in the United States. Washington, D.C.: National Academy Press. Lawani, Stephen M., and Alan E. Bayer. 1983. Validity of citation criteria for assessing the influence of scientific publications: New evidence with peer assessment. Journal of the American Society for Information Science 34 (1) :59-66. Leydesdorf, Loet. 1987. Various methods for the mapping of science. Scientometrics 11~5-6~:295-324. , and Peter van der Schaar. 1987. The use of scientometric methods for evaluating national research programs . Science and Techno ~ ogy StuciJes 5 ~ 1 ) : 2 2 -31 . Long, J. Scott, Paul D. Allison, and Robert McGinnis. 1979. Entrance into the academic career. American sOcio ~ ogica Review 44 (5~: 816-830. , and Robert McGinnis. 1981. Organizational context and scientific productivity. American Socio 7 ogica 7 Review 46:422-442. Martin, Ben R., and John Irvine. 1983. Assessing basic research. Research Policy 12:61-90. McAllister, Paul R., and Francis Narin. 1983. Characterization of the research papers of U.S. medical schools. Journal of the American Society for Information Science 34 (2) :123-131. 67

OCR for page 41
McGinnis, Robert, and J. Scott Long. 1982. Postdoctoral training in bioscience: allocation and outcomes. Social Forces 60 ~3 ) : 701-722. Moed, H. F., J. M. Burger, J. G. Frankfort, A. F. J. Van Raan. 1985. A comparative study of bibliometric past performance analysis and peer judgment. Scien tome trios 8 : 3-4 . Moravcsik, Michael J. 1986. Assessing the methodology for finding a methodology for assessment. Socia ~ Stud' es of Science 16:534-39. Narin, F. 1976. Evaluative Bib~iometrics: The Use of Publication and Citation Analysis in the Evaluation of Scientific Activity (Report to the National Science Foundation). (Now available only through the National Technical Information Service (NTIS no. PB 252339/AS). . 1983. Subjective vs. Bibliometric Assessment of Biomedical Research Publications (NIH Program Evaluation Report). (Unpublished report available from the NIH Office of Program Planning and Evaluation or from the author.) . 1985. Measuring the Research Productivity of Higher Education Institutions using Bib~iometric Techniques. Paper presented at a Workshop on Science and Technology Measures in the Higher Education Sector, OECD, Paris, France. . 1988. Indicators of Strength: Excellence and Linkage in Japanese Technology and Science. Paper presented at the National Science Foundation, June 21, 1988 (See also F. Narin and E. Noma, Is technology becoming science?, Scientometrics 7~3~:369-381, 1985.) , and J. K. Moll. 1977. Bibliometrics: Annual Review of Information Science and Technology 12:32-58. , G. Pinski, and H. H. Gee. 1976. Structure of the Biomedical Literature, Journa ~ of the American SocieLy for In f ormation Science 2 7: 2 5-4 5 . National Science Foundation. 19 74. Science Indicators. Washington, D.C.: U.S. Government Printing Office (this report is published annually). . 1982 . Studies of Scientific Discipl ines: An Annotated Bibliography. Washington, D.C.: U.S. Government Printing Office. Noma, Elliot. 1986. Subject Classification and Influence Weights for 3 000 Journa ~ s . Haddon Heights, NJ: Computer Horizons, Inc. 68

OCR for page 41
Pavitt, K., 1985. Patent statistics as indicators of innovative activities: Possibilities and problems. Scientometrics 7 (1- 2~:77-99. (Pavitt cites B. Basberg, Technological change in the Norwegian whaling industry: A case study of the use of patent statistics as a technology indicator, Research Po ~ i cy 11~3~:163-171, 1982.) Pinski, Gabriel. 1975. Subject Classification and Influence Weights for 2300 Journals (NSF Final Task Report). Haddon Heights, NJ: Computer Horizons, Inc. Porter, A. L., D. E. Chubin, and Xiao-Yin-Jin. 1986. Citations and Scientific Progress: Comparing Bibliometric Measures with Scientist Judgments. Scientometrics 13~3-4~:103-124. Price, Derek de Solla. 1961. Science Since Babylon. New Haven, Conn.: Yale University Press. . 1963. Little Science, Big Science. New York: Columbia University Press. Reskin, Barbara. 1979. Review of the Literature on the Relationship Between Age and Scientific Productivity. In Committee on Continuity in Academic Research Performance, Research Excellence Through the Year 2000: The Importance of Maintaining a Flow of New Faculty into Academic Research (Appendix C: 189-207~. Washington, D.C.: National Academy of Sciences. Schubert, A. 1985. Quantitative studies of science: A current bibliography. Scientometrics 9 ( 5 -6 ) 293-304. . 1986. Quantitative studies of science: A current bibliography. Scientometrics 8~1-2) :137-140. , and T. Braun. 1986. Relative indicators and relational charts for comparative assessment of publication output and citation impact. Scientometrics, 9 (5-6) 281-291. (See also T. Braun, W. Glanzel, A. Schubert, One more version of the facts and figures on publication output and relative citation impact of 107 Countries, 1978-1980, Scientometrics 11~1- 2):9-15, and (3-4):127-140.) Schubert, A., and W. G1anzel. 1983. Statistical reliability of comparisons based on the citation impact of scientific publications. Scientometri as 5~1~:59-74. , W. Glanzel, and T. Braun. 1983. Relative citation rate: A new indicator for measuring the impact of publications. In D. Tomov and L. Dimitrova teds.), Proceedings of the First National Conference with International Participation on 69

OCR for page 41
Scientometrics and Linguistics of Scientific Text, Varna, pp.80-81. Simeon, V. L., et al. 1986. Analysis of the bibliographic output from a research institution in relation to the measures of scientific policy. Scientometrics 9~5-6~:223- 230. Stowe, Robert C. 1986. Annotated Bibliography of Publications Dealing with Qualitative and Quantitative Indicators of the Quality of Science (A technical memorandum of the quality indicators project). Cambridge, MA: Harvard University. Van Heeringen, A., and P. A. Dijkwel. 1987a. Age, mobility and productivity: I. Scientometrics 11:267-280. . 1987b. Age, mobility and productivity: II. Scientometrics 11:281-293. Vinkler, P. 1986. Evaluation of some methods for the relative assessment of scientific publications. Scientometrics 10~3- 4~:157-177. . 1987. A quasi-quantitative citation model. Scientometrics 12~1-2~:47-72. (See also B. Cronin, The Citation Process, London: Taylor Graham, 1984; D. Chubin and S. D. Moitra, Content Analysis of References: Adjunct or alternative to citation counting? Socia ~ Studies of Science 5:423, 1975; and M. J. Moravscik and P. Murugesan, Some results on the function and quality of citation, Socia Studies of Science 5:86-92, 1975.) Weiss, C. H. 1972. Evaluation Research: Methods of Assessing Program Effectiveness. Englewood Cliffs, NJ: Prentice-Hall Inc. 70

OCR for page 41
APPENDIX: SCIENCE 8TUDIE8 =80URCES Nearly three-quarters of a century has passed since Cole and Eales in 1917 reported their international comparison of counts of books and papers in comparative anatomy published between 1543 and 1880 (Narin, 1977~. In 1926 LoLka demonstrated that the distributions of publications in a discipline (physics) is widely skewed and that most scientific papers are published by a small minority of scientists (Fox, 1983~. So began inquiries into the use of publications measures in the assessment of productivity and the closely related concept of eminence. Rapid advancement, however, became feasible only when computers became readily accessible and inexpensive in the 1950s. In a landmark empirical study conducted between 1954 and 1957, a committee of the American Psychological Association conducted an extensive inquiry into the correlates of productivity of all doctorates granted in the field of psychology between 1930 and 1944 (Clark, 1957~. The study was significant in employing publication and citation measures as correlates of peer assessments of productivity and in recognizing the importance of investigating differences among subdisciplines and of taking into account variations in background, social, and psychological characteristics as correlates and potential predictors of eventual professional accomplishment and status. The study was also noteworthy in its use of computer-implemented quantitative methods to describe and compare the most productive with other members of the profession. In this sense it marks the empirical beginning of what has become a worldwide effort on the part of both theoretical and empirical investigators to achieve a better understanding of how science and scientists function and thrive in the society of our time. Comprehensive theoretical and methodological as well as empirical studies of the sociology, psychology, and economics of science and scientists did not begin to appear in large numbers until the 1960s. Derek de Sofia Price (1963) is appropriately credited with sparking the present-day intellectual development of inquiry into the assessment of research quality and eminence. Since then studies have proliferated rapidly in depth, breadth, and complexity as well as in number. Hjerppe (1982) added 518 items to an over 2,000-item "Bibliography of Bibliometrics and Citation Indexing & Analysis" published in Sweden in 1980. More directly relevant to the present inquiry are bibliographies that are being developed to assist groups of interested and involved scientists in their attempts to keep up with research aimed at achieving better understanding of how science and scientists function. Although it is not feasible to attempt a comprehensive review of all bibliographies that might be helpful to those concerned with the analysis of productivity and its essential correlates, a brief description of some publications that cover a 71

OCR for page 41
great deal of the relevant research effort to about 1980 may be useful. *** Jonathan and Stephen Cole, Socia 7 Stratification in Science (1973~: The Coles conducted several different cross-sectional studies of academic physicists in their investigation of the social stratification system in science. The Coles staunchly defended the view that science functions as a meritocracy and concluded that physics is a universalistic and rational discipline in which quality of work (as measured by citations) is the chief determinant of ultimate status. (A recent personal communication indicates that J. Cole delivered a paper at American Sociological Association meetings that partially recants earlier views on universalism.) For more up-to-date, longitudinal analyses of scientists in biochemistry that result in a different conclusion, see Long et al. (1979), Long and McGinnis (1981), and McGinnis and Long (1982~. The Coles examined multivariate interrelationships among departmental rank, number and assessed prestige of honorific awards, membership status in professional societies, geographical location, number and "quality" (citation counts) of publications in exploring the development of professional visibility, and eminence. The book also contains a brief historical account of the development of research in the social science of science. *** Francis Narin, Eva 7 native Bib7iometrics ~1976) : Narin cited 140 papers in providing a brief historical account of the development of techniques of measuring publications and citations, in reviewing a number of empirical investigations of the validity of bibliometric analyses, and in presenting details of the characteristics of and differences among scientific fields and subdisciplines. (The Annual Review of Information Science and Technology published a bibliography entitled "Bibliometrics" by Narin and Moll (1977), which contains many, but not all of the same references that are in Evaluative Bibliometrics. ~ The book, prepared for the National Science Foundation, contains explicit details of how several indices of journal influence are calculated and how variations within a field of science differ from variations within a subdiscipline. Three different influence measures are provided for each of the 2,250 journals in the 1973 Science Citation Index. ~ (New influence indices have since been calculated for some 3 000 journals in the 1982 SCI (see Noma, 1986~.] Some two dozen studies are cited that deal with the correspondence between literature-based and other methods of assessing the quality of scientific output. *** NSF Division of Planning and Policy, Social Studies of Scientific Disciplines, ~1982) : This annotated bibliography "makes accessible to the managers and practitioners of science and engineering the findings from the social studies of science in a form that will be useful to them." The bibliography covers studies conducted up to the mid 1980s and reports on the work of

OCR for page 41
nearly 300 authors, most with multiple entries. Although only one subsection is entitled "Productivity," it is not an exaggeration to estimate that at least 90 percent of entries in the work deal with material relevant to the measurement of this concept. An approximately similar percentage describe investigations that employ publications measures in their investigations of 23 identifiable but related subjects as dealt with in studies of 13 disciplines. A tote] of 285 studies yield nearly 500 entries in the bibliography, many studies having dealt with multiple disciplines. Subject categories in the bibliography include: Attitudes and Values Career Patterns Competition Development of Disciplines Discipline Comparisons Discipline Organization Discovery Process Education, Grad. Educ. Funding of Research Information Exchange National Comparisons Paradigm Characteristics Performance of research Productivity Productivity - age Professional Associations Publication practices Recognition and reward Social stratification Structure of the literature Structure of literature-- Specialty groups Citation rates Journal influence University Ratings *** Mary Frank Fox, "Scientists' Publication Productivity," Social Studies of Science ~1983) : In this critical review, Fox discusses publication productivity in relation to psychological characteristics of individuals such as motivation, ego strength, cognitive style, personality and interests, and IQ, noting the restricted range of ability among scientists and the corresponding low correlation with measures of productivity as well as the fact that creativity does not exist in a vacuum. Citing Peiz and Andrews, she states, ''Rather, social factors so affect the translation of creative ability into innovative performance that measured creativity is virtually unrelated to either the innovativeness or the productiveness of scientists' output." The importance of environmental characteristics such as institutional prestige and organizational freedom are summarized, including the important findings of Long and McGinnis, whose longitudinal studies point to the stronger effect of location on productivity than of productivity on subsequent location as had been previously reported in studies using cross-sectional designs. An interesting discussion of the closely entwined concepts of cumulative advantage and reinforcement is also included in this review of approximately a hundred different studies. *** A. Schubert, "Quantitative Studies of Science: A Current Bibliography," Scientometrics (1985 and 1986~. Close to 100 papers are listed in each year, and the list does not include 73

OCR for page 41
those published in Scientometrics itself. The vast majority deal with empirical and methodological papers on bibliometric topics. While no country exceeds the United States in number of papers listed, the total number of foreign papers, not including Canada and the United Kingdom, was nearly twice the number of United States publications. *** Robert C. Stowe, An Annotated Bibliography of Publications Dea ~ ing with Qua ~ itative and Quantitative Indicators of the Qua 7 i ty of Science (Inc ~ uding a bib 7 iography on the access of women to participation in scientific research} (1986~. In addition to a list of core books, annotated entries are made under the following headings: I. Bibliometric indicators of the quality of scientific research -Citations -Critiques -Citation Context Analvs; s and publications as indicators of quality of citation analysis II. Qualitative approaches to and more general works on research evaluation III. Works dealing specifically with "science indicators" IV. Forecasting and research priorities V. Peer review VI. Quality and quantity in the history of science and philosophy VII. Education VIII. Issues involving quantity and quality in particular disciplines, including papers on social indicators IX. Sociology of science X. Methodological papers and bibliographies XI. Access of women to participation in scientific research 74