Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 41
PRODUCTIVITY
Helen Hofer Gee*
INTRODUCTION
In 1986 NIH contracted with the Institute of Medicine to
organize a conference on research training. A central, though
not explicitly stated, purpose of the conference was to obtain
guidance on how to continue to meet Congressionally mandated
requirements for periodic reports on the role of and need for
research training in the biomedical and behavioral sciences.
Ostensibly, the conference was concerned with
an examination of how successfully research training
has been conducted, which program mechanisms produce
the most suitable training, and what information is
required to enable further assessments of national
needs for researchers in the decade ahead. blister
first among three categories of issues requiring
attention was] . . . measuring the productivity of
scientists in their research programs and as
reflections of their training. The issue in
productivity is how to improve the measurement of it;
simply gauging productivity by the current popular
methods is inadequate for the task at hand. (Institute
of Medicine, 1986)
Anyone who has ever been faced with the task of having to
select among individuals--for employment, advancement, funding,
awards--has dealt with the issue of assessing productivity and
has, implicitly or explicitly, weighed available evidence of
previous performance. The difficulty and complexity of these
decisions may well underlie the malaise that is apparent in the
committee report. The committee's more explicitly reported
concerns with such measures as success in obtaining research
grants, citation counts that ignore differences among and
possibly within disciplines, and studies that fail to consider
work environments suggest that the real problem lies not in the
measures of productivity per se that have been used, but in how
the measures have been used--that is, in the designs of
assessments of training support programs.
* The opinions expressed in this paper are the author's and do
not necessarily reflect those of either the Committee on
Biomedical and Behavioral Research Personnel or the National
Research Council.
OCR for page 42
Unfortunately for those who seek quick solutions, concepts
relevant to the measurement of productivity are inextricable from
those concerning almost al, other domains within the social study
of science. Dealing directly with the problems of productivity
measurement therefore requires cognizance of the state of the
entire science. Any study, for example, that ignores differences
among or within disciplines ignores more than three decades of
intensive study of the entire social structure of science, not
just the study of "productivity" per se.
In the critical, scholarly essay, Gilbert (1978) noted that
there is a reciprocal relationship between the
theoretical framework which the social scientist brings
to his work and the indicators which he will find most
appropriate for his research . . . the adequacy of an
indicator can only be assessed through a detailed study
of the context in which the phenomena to be measured
are embedded, and of the validity of the measurement
theory on which it relies . . . this requirement is
equivalent to the demand that we understand the
functioning of the scientific community at a micro-
level.
A sma11 community of scientists (static in size in the United
States since about 1980, but rapidly increasing in Western and
Eastern Europe and Japan since the late 1970s) has been making
significant progress in the direction Gilbert suggests (see
Appendix and References). The most recent burst of research
activity relevant to the assessment of productivity began when
Martin and Irvine (1983) assessed basic research activity and
programs (radio astronomy and physical sciences). Their papers
specified "partial indicators" of scientific progress and
investigated the extent to which these indicators '"converge" to
produce valid and reliable estimates of the productivity of
designated groups of scientists. The work created a virtual
storm of criticism, largely because it was so far-reaching (see
Chubin, 1987~. The continuing discussion has instilled new vigor
into the development of the field.
The concept of multiple partial criteria was certainly not
introduced by Irvine and Martin. Even Clark's study of the
careers of psychologists (1957) incorporated the concept in a
general sense. As noted by Jones et al. (1982), Weiss (1972)
discussed them:
At best each is a partial measure encompassing a
fraction of the large concept . . . . Moreover, each
measure [may contain] a load of irrelevant
superfluities, "extra baggage" unrelated to the
outcomes under study. By the use of a number of such
measures, each contributing a different facet of
information, we can limit the effect of irrelevancies
42
OCR for page 43
and develop a more rounded and truer picture of program
outcomes.
However, as Chubin (1987) concludes in his discussion of Irvine
and Martin's work, it is
. . . also politically astute, serving scholarly and
policy communities. It explicitly anticipates
criticism and sources of error, disarms skeptics, and
gets an analytical foot in the right doors--those
shielding the offices of policymakers who have come to
rely on participant scientists and their own imprecise
and self-serving devices for making decisions about who
gets and who doesn't.
Moravcsik (1986) hailed the extensive debate and critiques
of the Martin and Irvine work as a welcome sign--
since it shows that the field has reached the state of
maturity when its applications to concrete situations
are sufficiently realistic to create a heated
controversy, involving people from a variety of
professional backgrounds.
He commented further that neither critics nor Irvine and Martin,
in their response to critics, offered any specific suggestions
for improvement. Moravcsik then proposed that some suggestions
can be made and conclusions drawn concerning the need for future
activities by relating the debate to another effort in science
assessment--namely, a project organized by the United Nations
Center for Science and Technology for Development (UNCSTD) and
centered on a paper that Moravcsik was commissioned to write.
Moravcsik reported further that at a meeting held in Graz,
Austria, in May 1984 the paper was discussed:
The UNCSTD project did not result in a set recipe for
assessing science and technology. On the contrary,
the project concluded that there is no such universal
recipe, and hence that the aim should be to devise a
process which, in any particular case, yields a
methodology for an assessment.
SUGGESTED OUTLINE FOR PLANNING STUDIES
OF PRODUCTIVITY OR QUALITY
The proposed UNCSD process serves as a useful framework in
which to present some thoughts about planning studies focusing on
the assessment of productivity. The following list draws heavily
and directly on Moravcsik's report:
43
OCR for page 44
1. Identifv the coals of science that
_ __, _ ~ ~ _ are to be taken into
account. Moravcsik noted,
. . . Science and technology have many different goals,
aim and justifications, and in any particular case it
must be specified which one (or which ones) of these
are taken into account, and with what weight.
Studies of National Institutes of Health research training
programs have ostensibly aimed at assessing a common goal of such
programs--to wit, the production of trained scientists who will
contribute to the advancement of the biomedical sciences. Prior
to the mid 1970s, this was interpreted by some Institutes as
including support for the clinical training of physicians In
areas where the supply of expertise was felt to be inadequate.
After it became clear that the majority of these individuals
simply entered private practice, however, those programs were for
the most part discontinued. Such discrepancies must, therefore,
be given careful attention in planning studies of program
outcomes. Since the mid '70s, NIH training programs in general
have focused specifically and exclusively on research training.
Assessment of the success of these research training programs
have, however, interpreted the terms ~t contribute" and
"advancement" quite narrowly. Teaching (either future
researchers or practitioners), biomedical research
administration, mentoring (i.e., guiding the graduate education
of future researchers), and conducting research that does not
seek external funding and research that cannot (because of the
interests or concerns of the power structure within which it is
conducted) be published have often been denied recognition as
goal-relevant behavior. Consideration should be given to whether
any of these professional activities should be explicitly
recognized as contributing to the advancement of the biomedical
sciences and, if so, studies should be designed to assess these
kinds of productive endeavor.
Recognize the multidimensionality of goals of potential
pathways to them, and of methods of measuring outcomes:
specify which dimensions and connections of the system are
to be taken into account.
Once goals have been specified (and it is recognized that
achieving those goals can and is likely to be expressed in
different ways), study designs must allow for the varieties of
pathways and outcomes that may occur. Cole and Cole (1973) set
the stage for this type of inquiry in their cross-sectiona~
analyses. The work of Long, McGinnis, and Allison (1979, 1981,
1982) examined many of the same 'connections" as the Coles but,
by following a cohort longitudinally, revealed a different
sequence of career development. The Long and McGinnis work has
been particularly notable in its pursuit of the significance of
context, the multidimensionality of career pathways, and the
44
OCR for page 45
changing significance of predictors in assessing productivity at
stages in research career development. In another notable
analysis of the NIH Research Career Development Program, Carter
et al. (1987) examined both selection processes and outcomes
using multivariate techniques to assess the significance of
correlates and causal relations, as well as a sophisticated
cohort selection procedure to control for disciplinary
differences.
3. If. as is usually the case, it is not feasible to study all
aspects of a system, specify which aspects are to be
included and which will be omitted and indicate clearly the
implications of these decisions for the assessment process.
Moravscsik provided an apposite illustration of one
perspective:
If, of two cars, one has a higher top speed, and the
other a lower gasoline consumption per mile, it is
not possible to say which is the 'better' car without
ascribing some value judgment to high speeds versus
economy in the use of fuel.
Two other examples come to mind: (1) if in planning a study of
the effectiveness of a training program, it was decided that
pursuit of a research career in the private sector was a
favorable outcome but that assessing the performance of former
trainees who followed that path was not feasible, they could be
explicitly excluded from potential comparison cohorts; (2) if
research administration is deemed a favorable outcome, those
research administrators could be excluded from comparisons in
which research publications were used as indicators and included
where other measures of productivity, more suitable to their
employment, were used. The guideline simply demands precise
specification of the details of the design of an investigation.
4. Specify how the results of the assessment are to be used.
A study intended to assist program managers in their
decisionmaking will seldom have the same design requirements as a
study intended to inform policy decisions. If policy
decisionmakers are to be informed, for example, the delineation
of possible alternative indicators of productivity may be
critical, whereas meeting program management needs may require
more intensive analysis of only those that are the most direct
manifestations of program goals. The key is to consider
carefully the kinds of decisions that the study is intended to
influence.
45
OCR for page 46
Select a set of indicators that will satisfy the
l
requirements of each of the study design considerations.
Recognize and specify the limitations of each of the
indicators. To quote Moravscik,
There are many types of indicators: input versus
output; quantitative versus qualitative, indicators of
activity, productivity, or progress; indicators of
quality, importance or impact; there are functional and
instrumental indicators; there are micro- and macro-
indicators; there are "data-bases" and '"perceptual"
indicators; and so on. Some indicators are already 'on
the shelf' and can be taken from it and used in new
situations. More likely, however, the most appropriate
indicators for a new situation need to be improvised
for that particular situation. . . . Be reconciled to
the fact that in any case, you will end up with a set
of indicator measurements which, in general, cannot be
reduced to a one-dimensional measure and hence to an
unambiguous ranking.
It is apparent that the selection and/or development of
indicators of productivity depend on the kinds of questions that
are being asked and the perceived complexity of the system
involved. An indicator that provides excellent explanatory data
for one study may be useless in another context. Every measure,
moreover, has limitations that may, under some conditions,
obviate their utility and, in other circumstances, may be totally
irrelevant. If a study plan is suitably mapped, it may not be
feasible to use the same indicators of productivity for all
individuals in a cohort. For example, if teaching undergraduate
students is judged to be an acceptable outcome of research
training, the productivity of an individual whose primary
activity is teaching will not be appropriately assessed by
counting that individual's production of research papers--but
consideration might be given to using the production of review
papers as one of several measures of performance in the
educational domain. However, for some outcomes regarded as
suitable expressions of the goals of an enterprise, no suitable
approach to assessment "measurement" is available to evaluators.
In such cases the individuals should be removed from comparison
groups that are to be analyzed statistically rather than, as is
often the case, counted a "failure" according to indicators that
appropriately measure the productivity of other members of the
group.
MEASURES OF PRODUCTIVITY
The above overview should make it clear that any discussion
of specific measures of productivity is necessarily superficial,
simplistic, and incomplete because outside the context of the
46
OCR for page 47
design of a specific study, there is not a great deal to be said
about any particular measure. In addition, since productivity in
one sense or another is the focus of most of the studies of the
social science of science, a thorough literature review would
require a few years of effort. Nonetheless, various measures
that might be used in studies of productivity are discussed
below. The discussion is intended to draw attention to
complexities, issues, and problems in the use of these measures,
knowledge of which might aid in carrying out the kind of careful
approach to study design outlined earlier.
Publication Counts
While it is generally agreed that the principal, or most
prevalent, immediate outcome of the active research
investigator's efforts is the preparation of papers published in
professional journals and by 1982 a nearly 2600-item bibliography
listing publications analysis items was available (Hjerppe,
1982), counts of publications continue to be derogated. Because
the analysis of publications plays a dominant role in social
studies of science, a complex, highly sophisticated methodology
has been developed. The intellectual leader of modern-day social
studies of science was Derek J. deSolla Price (1961, 1963) e The
early development of computer-based analytic methods, which have
stimulated much of the sophisticated analysis characteristic of
social analysis of science studies of the past two decades,
resulted largely from the enterprise of two individuals: Eugene
Garfield (1955) developed the Science Citation Index (SCI), on
which most publications analysis work is dependent; Francis Narin
and his associates at Computer Horizons took the lead in
exploring and developing measures to maximize the utility of the
wealth of information contained in the SCI. In 1969 (Narin,
1977) the area even acquired its own label--bibliometrics--to
describe collectively quantitative, analytical studies of written
· e
communication.
In simplest terms, publication counts are no longer
acceptable as a measure of productivity unless at least the
following potential sources of error or misinterpretation are
controlled or accounted for:
o differences among disciplines of cohort members,
o differences among journals in terms of measured
influence (see section on journals page 131),
o
o
differences in "quality" or "impact" as measured by
citations or peer assessment (or journal influence),
professional age of cohort members, and
47
OCR for page 48
o social context of cohort members.
Despite concerns about "loud noises from empty vessels,"
publication counts have been shown repeatedly to correlate
positively with assessments of quality and to contribute useful
independent variance to the assessment of productivity. Reported
correlations between quantity and quality measures vary
considerably among studies, between approximately r =.23 and
r =.80; differences may relate to disciplines, characteristics of
cohorts, or even to how quantity and quality of publications are
measured.
In a series of studies conducted in the late 1970s (see
Narin, 1983), numbers of publications by faculty and staff in
universities and hospitals were shown to be extremely highly
correlated with NIH funding (r = .90 to 95~; and there were no
economies or diseconomies of scale in the funding of research
grants. Funding and publication relationships may appear to
break down, however, when small aggregates of researchers or
disciplines are assessed and especially when basic and clinical
research publications are intermixed. Publication rates of basic
scientists differ markedly from those of clinical scientists, who
publish less frequently and whose research is usually very much
more costly. then the funding and publication rates of small
aggregates of subjects are investigated, the tendency is to
ignore such disciplinary differences, thus ignoring an important
moderating variable. With small aggregates other minor sources
of error--such as idiosyncratic events that may affect the usual
patterns of behavior of part of a group for a period of time--
may also obscure an underlying relationship. When large
aggregates and adequate time spans are employed, such obfuscating
sources of error will usually cancel each other out, permitting
stable, underlying relations to be revealed.
When a quick, inexpensive estimate of productivity is
needed, large quantities of data are available, and the
comparability of cohorts is established, a simple count of
publications may well provide adequate information. Ordinarily,
however, such a single measure is useful primarily as a means of
setting the stage for a more comprehensive investigation of some
aspect of science or scientific behavior.
Weighted Counts: The use of weighted counts of papers permits
obtaining a preliminary estimate of quality without waiting for
citations to become available; it is also an inexpensive means of
obtaining an estimate of quality for large numbers of papers.
48
OCR for page 49
Each paper is weighted by an influence weight assigned to the
journal in which the paper appears.)
Paper Counts in the "Best" Journals: Committees charged with
evaluating group or individual scientific performance will
sometimes request that publications be counted only in a
selection of the ''best'' journals. Such a practice would be
seriously inequitable, since scientists do not have equal access
to journals. For example, those located in smaller institutions
are more often published in less influential journals, as are
younger, less well-established investigators; and regional
differences abound in some disciplines. McAllister and Narin
(1983) investigated these relationships in the publications of
all U.S. medical schools, using average citation influence per
paper measures: the average citation influence per paper
increased with the total number of biomedical publications, even
when institutional control (public and private), region, and
areas of research emphasis were control led. The positive
relation between number of papers and citation influence was
shown to hold within disciplines (biochemistry and internal
medicine were analyzed in detail) and within research "level"
(i.e., along basic and clinical research dimensions).
Data Bases: NIH-supported studies that have involved counts of
published scientific papers have almost always depended on
computerized data bases derived by CHI from Medline and the SCI.
The source data bases require a great deal of preliminary
massaging to consolidate information and correct inconsistencies;
but once prepared, they make data available unobtrusively, make
accessible several different quantitative measures of publication
performance, avoid the increasingly restrictive problem of
securing clearance from the Office of Management and Budget
(involved, in studies of federal programs, in any attempt to go
directly to the scientific community for information), and are
more accurate than individual reports. An interesting departure
from the use exclusively of the comprehensive data base was
reported by V. L. Simeon et al. (1986), who had studied a large
research institution in Yugoslavia. In their study several forms
of publication and communication were employed in addition to SCI
journals (e.g., papers in other scientific journals and congress
proceedings, books and monographs, technical articles in
1The technique developed by Computer Horizons, Inc. (CHI),
determines journal influence weights by the weighted number of
citations each journal receives over a given period of time. See
F . Narin , Evaluative Bib ~ iometrics: The Use of Pub ~ ication and
Ci tation Ana ~ ysis in the Eva 7 nation of Scientific Activi ty
(Report to the National Science Foundation), 1976; and F. Narin,
G. Pinski, and H. H. Gee, "Structure of the Biomedical
Literature," Journal of the American Society for Information
Science 27:25-45, 1976.
49
OCR for page 50
encyclopedias and popularizations, and presentations at
scientific meetings). ~ multivariate analysis revealed
interesting patterns of change among the several variables over
time. This rather preliminary study, which was focused on change
in publication behavior following the introduction of minimal
criteria for promotion warranted no conclusions; but it suggested
to this writer the possibility that some measures of these types
might be useful in considering criteria suitable for assessing
the productivity of individuals whose careers, though academic,
are not directly focused on the production of original research.
Activity Indexes: In recent years the utility of a new approach
to using publication counts, the "activity index," has been
demonstrated, particularly in studies conducted by CHI for NIH e
Activity indexes are ratios that make use of publication counts
in a relational context, thus allowing comparisons to be made
among groups while allowing each group to be described within its
own context. 2 Describing NIH Institutes' relative investment in
the support of research in different disciplines is a case in
point (see Gee and Narin, 1986~. Journal papers are more readily
and accurately assigned to disciplines than are dollars, and a
ratio that describes an Institute's investment in a discipline
relative to both the Institute's investment in all disciplines
and the "size" of the discipline among all others in a data set
provides a great deal of information for comparison among both
disciplines and Institutes. Schubert and Braun (1986) suggest
several additional types of indexes that might be useful for
different purposes.
CITATIONS
Ever since Clark's study of psychologists (1957), citation
counts have been a favored measure for the assessment of
productivity. In most cases, citations ad one or in combination
with publication counts are more closely correlated with
subjective estimates of productivity than are any other measures.
They are more universally applicable to the assessment of
scientific research activity than are other measures because (1)
publication is the most accessible means of expression available
to all scientists, and (2) being published offers a broader
audience to the scientist than any other medium.
2The percent of an organization's papers that are published
in a given discipline is divided by the percent of all papers in
the data set that represent that discipline. An index of "1.0"
indicates that the level of publication activity of this group in
this discipline is consonant with the level that discipline
represents among all disciplines.
50
OCR for page 51
mechanism:
Rather than referring to citations as measures of "quality,
as was common in the 1970s, the current practice is to refer to
them as measures of "impact" or "utilization" or "influence."
The implication is that before citations can be referred to as
measures of the "quality" of research, the issue should be
investigated in the given context of definition.
11
From an entirely different perspective! Moravcsik (1986),
Chubin (1987), Cronin (1984), Vinkler (1987), and others have
discussed and/or analyzed the functions and meaning of citations
in terms of author motivation. Vinkler, whose contribution is
most recent, has provided a concise review of the literature
concerning definitions, classification, and roles that citations
play in the scientific literature, concluding (in concert with
Cronin) that the information carrier role is the most important.
Vinkler distinguishes between "professional" (work is based on
the cited work or uses part of it) and 'connectional" (e.g.,
desire of an author to establish a connection with the cited
author or work) reasons for citation. In Vinkler's study, a
group of productive investigators rated each of the references
they had listed in a selected recent paper, identifying which of
eight professional and/or nine connectional reasons had motivated
the decision to cite, and the strength of the motive. Most (81
percent) citations were made solely for professional reasons--
that is, in a literature review for "completeness" or because the
current work was based at least in part on the cited work, the
cited work confirms or supports the work in the citing paper, or
the cited work is criticized or refuted (at one of three levels).
Citations made partially for professional and partially for
connectional reasons accounted for 17 percent; only two percent
were made solely for connectional reasons. It was also found
that two to three times as many papers are reviewed as are
actually cited. Failure to cite was also investigated; the
principal reason found was that a work was not considered
important enough to the current effort to warrant citation.
Second most important was the "obliteration" phenomenon--the
origin so well known that citation was not needed. A citation
threshold model has been developed, and data confirm that the
threshold depends primarily on the professional relevance of the
work potentially citable in a given paper.
Narin (1976), considered citations as an assessment
Citation counts may be used directly as a measure of
the utilization or influence of a single publication or
of all the publications of an individual, a grant,
contract, department, university, funding agency or
country. Citation counts may be used to link
individuals, institutions, and programs, since they
show how one publication relates to another. . . In
addition to these evaluative uses, citations also have
51
OCR for page 64
patenting behavior within firms. Systematic sample survey data
are required on the following subjects:
o the sources of the innovative activities that
lead to patenting in particular, the
intersectoral variance in the relative
importance of R&D, production engineering small
firms, and other sources;
o the time distribution-of patenting activities
over the life cycle of an innovation (in
particular, does patenting typically reach a
maximum at the time of commercial launch?;
o the propensity to patent the results of
innovative activities: in particular, sector
specific factors related to the effectiveness
of patenting as a barrier to imitation,
compared to alternatives; firm-specific factors
related to perceptions of the costs and
benefits of patenting; and country-specific
factors redating to the costs and benefits of
patenting; and
o the judgment of technological peers on the
innovative performance of specific firms and
countries, and on the relative rate of
technological advance in specific fields: in
particular, the degree to which these judgments
are consistent with the patterns shown by
patent statistics.
Finally, Pavitt calls for improved classification schemes,
such that established patent classes can be matched more
effectively, on the one hand to standard industrial and trade
classifications and, on the other, to technically coherent fields
of development.
SUMMARY
There are, simply, no easy, ready-made solutions to the
problems of identifying measures that will be useful in the
assessment of productivity. There is need for the development
and application of creative approaches to improving the utility
of the kinds of information that can be obtained. The
development, for example, of indexes that may increase the
equitability of some measures. And there is need as well, in
many cases, for increased attention to detail in designing
studies and analyzing data.
64
OCR for page 65
The two sources of information that have the broadest
potential value in the assessment of academic scientific
performance are peer assessment and the analysis of publications,
though there are circumstances in which neither may be
appropriate. (For analyses involving the commercial sector,
patent analysis--when used as an extension of publication
analysis--should probably be added.) From the perspective that
they tend to be fairly highly correlated, each contributes
somewhat to confidence in the other, and to the extent that they
are not correlated the need for both kinds of information is
greater in the given measurement situation. Because peer
assessment is so extremely costly, time consuming, and difficult
to employ equitably, it may be necessary or worthwhile,
especially in large-scale studies, to investigate whether there
are records available about--for example, program operation,
faculty activity, support, student outcomes, and resources (in
addition to publication data)--that might be able to account for
a large proportion of the variation in peer judgments of, program
quality.
On the other hand, the use of publication and citation
measures as the sole consideration in the assessment of the
individual scientist's productivity can be rejected on a purely
rational basis. As a means of confirming a positive subjective
judgment of individual performance, there is no problem, but the
opposite does not hold because there are myriad alternative
explanations for low numbers of publications and for few or no
citations. One of the more significant misjudgments that can
result is the case in which few or no citations are received by
highly significant papers that either are ahead of their time or
are published in obscure journals. No imperfect too] that may be
used to the disadvantage of the single individual (including peer
judgment) can be justified. The caution warrants repeating (and
appears fairly frequently in the bibliometric literature) that
bibliometric measures are most appropriately employed in group
comparisons in which aggregates of publications are Jarge--just
how large depends on how closely comparison groups can be
matched. Correspondingly, peer assessments are most
appropriately employed when peers are equally informed about all
of the assessment targets and when self-serving competitive
interests are absent. Perhaps the single most important factor
in planning investigations of productivity is the need to employ
multiple measures and to apply them selectively to the
appropriate targets.
65
OCR for page 66
REFERENCES
Anderson, Richard C., Francis Narin, and Paul McAllister. 1978.
Publication ratings versus peer ratings of universities.
Journa ~ of the American Society for Information Science
29~2~:91-103.
Carter, Grace M. 1974. Peer Review, Citations, and Biomedical
Research Policy: NIH Grants to Medical School Faculty (Rand
Report R-1583-HEW). Washington, D.C.: Rand Corporation.
, Clara S. Lai, Carolyn L. Lee. 1978. A Comparison of
Large Grants and Research Project Grants Awarded by the
National Institutes of Health (Rand Report R-2228-1-NIH).
Washington, D.C.: Rand Corporation.
, John D. Winkler, and Andrea K. Biddle. 1987. An
Evaluation of the NIH Research Career Development Award.
Washington, D.C.: Rand Corporation.
Chubin, Daryl E. 1987. Research evaluation and the generation
of big science policy. Knowledge 9 (2) :254-277.
, and Soumyo D. Moitra. 1975. Content analysis of
references: Adjunct or alternative to citation counting?
Social Studies of Science 5:423-441.
Clark, Kenneth E. 1957. America's Psychologists: A Survey of a
Growing Profession. Washington, D.C.: American
Psychological Association, Washington, D.C.
Cole, Jonathan R., and Stephen Cole. 1973. Social
Stratification in Science. Chicago: University of Chicago
Press.
Committee on Science, Engineering, and Public Policy (COSEPUP)
1982. The Quality of Research in Science Methods for
Postperformance Evaluation of Research in the National
Science Foundation. Washington, D.C.: National Academy
Press.
Cronin, B. 1984. The Citation Process. London: Taylor Graham.
Fox, Mary Frank. 1983. Scientists' publication productivity
Social Studies of Science 13~2~:298-329.
Garfield,
167.
E. 1955. Citation indexes for science. Science
. 1972. Citation analysis as a tool in journal
evaluation. Science 178:471-479.
66
OCR for page 67
Gee, Helen Hofer. 1988. An Analysis of NIH Intramural Research
Publications, 1973-1984 (Report to the Committee to Study
Strategies to Strengthen the Scientific Excellence of the NIH
Intramural Research Program).
Academy Press.
Washington, D.C.: National
, and Frances Narin. 1986. An Analysis of Research
Publications Supported by NIH 1973-76 ant] 1977-80
(Publication No. 86-2777), Washington, D.C.: NIH.
Gilbert, G. Nigel. 1988. Measuring the growth of science.
Scientometrics 1~1~:9-34.
Hjerppe, R. 1982. Supplement to a "Bibliography of
bibliometrics and citation indexing & analysis."
Scientometrics 4 ~ 3 ~ : 2 4 1-2 7 3 .
Institute for Scientific Information. 1963. Science Citation
Index. Philadelphia, PA: ISI.
Jones, Lyle V., Gardner Lin~zey, and Porter Coggeshall (eds.~.
1982. An Assessment of Research-Doctorate Programs in the
United States. Washington, D.C.: National Academy Press.
Lawani, Stephen M., and Alan E. Bayer. 1983. Validity of
citation criteria for assessing the influence of scientific
publications: New evidence with peer assessment. Journal of
the American Society for Information Science 34 (1) :59-66.
Leydesdorf, Loet. 1987. Various methods for the mapping of
science. Scientometrics 11~5-6~:295-324.
, and Peter van der Schaar. 1987. The use of
scientometric methods for evaluating national research
programs . Science and Techno ~ ogy StuciJes 5 ~ 1 ) : 2 2 -31 .
Long, J. Scott, Paul D. Allison, and Robert McGinnis. 1979.
Entrance into the academic career. American sOcio ~ ogica
Review 44 (5~: 816-830.
, and Robert McGinnis. 1981. Organizational context and
scientific productivity. American Socio 7 ogica 7 Review
46:422-442.
Martin, Ben R., and John Irvine. 1983. Assessing basic
research. Research Policy 12:61-90.
McAllister, Paul R., and Francis Narin. 1983. Characterization
of the research papers of U.S. medical schools. Journal of
the American Society for Information Science 34 (2) :123-131.
67
OCR for page 68
McGinnis, Robert, and J. Scott Long. 1982. Postdoctoral
training in bioscience: allocation and outcomes. Social
Forces 60 ~3 ) : 701-722.
Moed, H. F., J. M. Burger, J. G. Frankfort, A. F. J. Van Raan.
1985. A comparative study of bibliometric past performance
analysis and peer judgment. Scien tome trios 8 : 3-4 .
Moravcsik, Michael J. 1986. Assessing the methodology for
finding a methodology for assessment. Socia ~ Stud' es of
Science 16:534-39.
Narin, F. 1976. Evaluative Bib~iometrics: The Use of
Publication and Citation Analysis in the Evaluation of
Scientific Activity (Report to the National Science
Foundation). (Now available only through the National
Technical Information Service (NTIS no. PB 252339/AS).
. 1983. Subjective vs. Bibliometric Assessment of
Biomedical Research Publications (NIH Program Evaluation
Report). (Unpublished report available from the NIH Office
of Program Planning and Evaluation or from the author.)
. 1985. Measuring the Research Productivity of Higher
Education Institutions using Bib~iometric Techniques. Paper
presented at a Workshop on Science and Technology Measures in
the Higher Education Sector, OECD, Paris, France.
. 1988. Indicators of Strength: Excellence and Linkage in
Japanese Technology and Science. Paper presented at the
National Science Foundation, June 21, 1988 (See also F. Narin
and E. Noma, Is technology becoming science?, Scientometrics
7~3~:369-381, 1985.)
, and J. K. Moll. 1977. Bibliometrics: Annual Review of
Information Science and Technology 12:32-58.
, G. Pinski, and H. H. Gee. 1976. Structure of the
Biomedical Literature, Journa ~ of the American SocieLy for
In f ormation Science 2 7: 2 5-4 5 .
National Science Foundation. 19 74. Science Indicators.
Washington, D.C.: U.S. Government Printing Office (this
report is published annually).
. 1982 . Studies of Scientific Discipl ines: An Annotated
Bibliography. Washington, D.C.: U.S. Government Printing
Office.
Noma, Elliot. 1986. Subject Classification and Influence
Weights for 3 000 Journa ~ s . Haddon Heights, NJ: Computer
Horizons, Inc.
68
OCR for page 69
Pavitt, K., 1985. Patent statistics as indicators of innovative
activities: Possibilities and problems. Scientometrics 7 (1-
2~:77-99. (Pavitt cites B. Basberg, Technological change in
the Norwegian whaling industry: A case study of the use of
patent statistics as a technology indicator, Research Po ~ i cy
11~3~:163-171, 1982.)
Pinski, Gabriel. 1975. Subject Classification and Influence
Weights for 2300 Journals (NSF Final Task Report). Haddon
Heights, NJ: Computer Horizons, Inc.
Porter, A. L., D. E. Chubin, and Xiao-Yin-Jin. 1986. Citations
and Scientific Progress: Comparing Bibliometric Measures
with Scientist Judgments. Scientometrics 13~3-4~:103-124.
Price, Derek de Solla. 1961. Science Since Babylon. New Haven,
Conn.: Yale University Press.
. 1963. Little Science, Big Science. New York: Columbia
University Press.
Reskin, Barbara. 1979. Review of the Literature on the
Relationship Between Age and Scientific Productivity. In
Committee on Continuity in Academic Research Performance,
Research Excellence Through the Year 2000: The Importance of
Maintaining a Flow of New Faculty into Academic Research
(Appendix C: 189-207~. Washington, D.C.: National Academy
of Sciences.
Schubert, A. 1985. Quantitative studies of science: A current
bibliography. Scientometrics 9 ( 5 -6 ) 293-304.
. 1986. Quantitative studies of science: A current
bibliography. Scientometrics 8~1-2) :137-140.
, and T. Braun. 1986. Relative indicators and relational
charts for comparative assessment of publication output and
citation impact. Scientometrics, 9 (5-6) 281-291. (See also
T. Braun, W. Glanzel, A. Schubert, One more version of the
facts and figures on publication output and relative citation
impact of 107 Countries, 1978-1980, Scientometrics 11~1-
2):9-15, and (3-4):127-140.)
Schubert, A., and W. G1anzel. 1983. Statistical reliability of
comparisons based on the citation impact of scientific
publications. Scientometri as 5~1~:59-74.
, W. Glanzel, and T. Braun. 1983. Relative citation rate:
A new indicator for measuring the impact of publications. In
D. Tomov and L. Dimitrova teds.), Proceedings of the First
National Conference with International Participation on
69
OCR for page 70
Scientometrics and Linguistics of Scientific Text, Varna,
pp.80-81.
Simeon, V. L., et al. 1986. Analysis of the bibliographic
output from a research institution in relation to the
measures of scientific policy. Scientometrics 9~5-6~:223-
230.
Stowe, Robert C. 1986. Annotated Bibliography of Publications
Dealing with Qualitative and Quantitative Indicators of the
Quality of Science (A technical memorandum of the quality
indicators project). Cambridge, MA: Harvard University.
Van Heeringen, A., and P. A. Dijkwel. 1987a. Age, mobility and
productivity: I. Scientometrics 11:267-280.
. 1987b. Age, mobility and productivity: II.
Scientometrics 11:281-293.
Vinkler, P. 1986. Evaluation of some methods for the relative
assessment of scientific publications. Scientometrics 10~3-
4~:157-177.
. 1987. A quasi-quantitative citation model.
Scientometrics 12~1-2~:47-72. (See also B. Cronin, The
Citation Process, London: Taylor Graham, 1984; D. Chubin and
S. D. Moitra, Content Analysis of References: Adjunct or
alternative to citation counting? Socia ~ Studies of Science
5:423, 1975; and M. J. Moravscik and P. Murugesan, Some
results on the function and quality of citation, Socia
Studies of Science 5:86-92, 1975.)
Weiss, C. H. 1972. Evaluation Research: Methods of Assessing
Program Effectiveness. Englewood Cliffs, NJ: Prentice-Hall
Inc.
70
OCR for page 71
APPENDIX: SCIENCE 8TUDIE8 =80URCES
Nearly three-quarters of a century has passed since Cole and
Eales in 1917 reported their international comparison of counts
of books and papers in comparative anatomy published between 1543
and 1880 (Narin, 1977~. In 1926 LoLka demonstrated that the
distributions of publications in a discipline (physics) is widely
skewed and that most scientific papers are published by a small
minority of scientists (Fox, 1983~. So began inquiries into the
use of publications measures in the assessment of productivity
and the closely related concept of eminence. Rapid advancement,
however, became feasible only when computers became readily
accessible and inexpensive in the 1950s.
In a landmark empirical study conducted between 1954 and
1957, a committee of the American Psychological Association
conducted an extensive inquiry into the correlates of
productivity of all doctorates granted in the field of psychology
between 1930 and 1944 (Clark, 1957~. The study was significant
in employing publication and citation measures as correlates of
peer assessments of productivity and in recognizing the
importance of investigating differences among subdisciplines and
of taking into account variations in background, social, and
psychological characteristics as correlates and potential
predictors of eventual professional accomplishment and status.
The study was also noteworthy in its use of computer-implemented
quantitative methods to describe and compare the most productive
with other members of the profession. In this sense it marks the
empirical beginning of what has become a worldwide effort on the
part of both theoretical and empirical investigators to achieve a
better understanding of how science and scientists function and
thrive in the society of our time.
Comprehensive theoretical and methodological as well as
empirical studies of the sociology, psychology, and economics of
science and scientists did not begin to appear in large numbers
until the 1960s. Derek de Sofia Price (1963) is appropriately
credited with sparking the present-day intellectual development
of inquiry into the assessment of research quality and eminence.
Since then studies have proliferated rapidly in depth, breadth,
and complexity as well as in number. Hjerppe (1982) added 518
items to an over 2,000-item "Bibliography of Bibliometrics and
Citation Indexing & Analysis" published in Sweden in 1980. More
directly relevant to the present inquiry are bibliographies that
are being developed to assist groups of interested and involved
scientists in their attempts to keep up with research aimed at
achieving better understanding of how science and scientists
function. Although it is not feasible to attempt a comprehensive
review of all bibliographies that might be helpful to those
concerned with the analysis of productivity and its essential
correlates, a brief description of some publications that cover a
71
OCR for page 72
great deal of the relevant research effort to about 1980 may be
useful.
*** Jonathan and Stephen Cole, Socia 7 Stratification in Science
(1973~: The Coles conducted several different cross-sectional
studies of academic physicists in their investigation of the
social stratification system in science. The Coles staunchly
defended the view that science functions as a meritocracy and
concluded that physics is a universalistic and rational
discipline in which quality of work (as measured by citations) is
the chief determinant of ultimate status. (A recent personal
communication indicates that J. Cole delivered a paper at
American Sociological Association meetings that partially recants
earlier views on universalism.) For more up-to-date,
longitudinal analyses of scientists in biochemistry that result
in a different conclusion, see Long et al. (1979), Long and
McGinnis (1981), and McGinnis and Long (1982~. The Coles
examined multivariate interrelationships among departmental rank,
number and assessed prestige of honorific awards, membership
status in professional societies, geographical location, number
and "quality" (citation counts) of publications in exploring the
development of professional visibility, and eminence. The book
also contains a brief historical account of the development of
research in the social science of science.
*** Francis Narin, Eva 7 native Bib7iometrics ~1976) : Narin cited
140 papers in providing a brief historical account of the
development of techniques of measuring publications and
citations, in reviewing a number of empirical investigations of
the validity of bibliometric analyses, and in presenting details
of the characteristics of and differences among scientific fields
and subdisciplines. (The Annual Review of Information Science
and Technology published a bibliography entitled "Bibliometrics"
by Narin and Moll (1977), which contains many, but not all of the
same references that are in Evaluative Bibliometrics. ~ The book,
prepared for the National Science Foundation, contains explicit
details of how several indices of journal influence are
calculated and how variations within a field of science differ
from variations within a subdiscipline. Three different
influence measures are provided for each of the 2,250 journals in
the 1973 Science Citation Index. ~ (New influence indices have
since been calculated for some 3 000 journals in the 1982 SCI (see
Noma, 1986~.] Some two dozen studies are cited that deal with
the correspondence between literature-based and other methods of
assessing the quality of scientific output.
*** NSF Division of Planning and Policy, Social Studies of
Scientific Disciplines, ~1982) : This annotated bibliography
"makes accessible to the managers and practitioners of science
and engineering the findings from the social studies of science
in a form that will be useful to them." The bibliography covers
studies conducted up to the mid 1980s and reports on the work of
OCR for page 73
nearly 300 authors, most with multiple entries. Although only
one subsection is entitled "Productivity," it is not an
exaggeration to estimate that at least 90 percent of entries in
the work deal with material relevant to the measurement of this
concept. An approximately similar percentage describe
investigations that employ publications measures in their
investigations of 23 identifiable but related subjects as dealt
with in studies of 13 disciplines. A tote] of 285 studies yield
nearly 500 entries in the bibliography, many studies having dealt
with multiple disciplines. Subject categories in the
bibliography include:
Attitudes and Values
Career Patterns
Competition
Development of Disciplines
Discipline Comparisons
Discipline Organization
Discovery Process
Education, Grad. Educ.
Funding of Research
Information Exchange
National Comparisons
Paradigm Characteristics
Performance of research
Productivity
Productivity - age
Professional Associations
Publication practices
Recognition and reward
Social stratification
Structure of the literature
Structure of literature--
Specialty groups
Citation rates
Journal influence
University Ratings
*** Mary Frank Fox, "Scientists' Publication Productivity,"
Social Studies of Science ~1983) : In this critical review, Fox
discusses publication productivity in relation to psychological
characteristics of individuals such as motivation, ego strength,
cognitive style, personality and interests, and IQ, noting the
restricted range of ability among scientists and the
corresponding low correlation with measures of productivity as
well as the fact that creativity does not exist in a vacuum.
Citing Peiz and Andrews, she states, ''Rather, social factors so
affect the translation of creative ability into innovative
performance that measured creativity is virtually unrelated to
either the innovativeness or the productiveness of scientists'
output." The importance of environmental characteristics such as
institutional prestige and organizational freedom are summarized,
including the important findings of Long and McGinnis, whose
longitudinal studies point to the stronger effect of location on
productivity than of productivity on subsequent location as had
been previously reported in studies using cross-sectional
designs. An interesting discussion of the closely entwined
concepts of cumulative advantage and reinforcement is also
included in this review of approximately a hundred different
studies.
*** A. Schubert, "Quantitative Studies of Science: A Current
Bibliography," Scientometrics (1985 and 1986~. Close to 100
papers are listed in each year, and the list does not include
73
OCR for page 74
those published in Scientometrics itself. The vast majority deal
with empirical and methodological papers on bibliometric topics.
While no country exceeds the United States in number of papers
listed, the total number of foreign papers, not including Canada
and the United Kingdom, was nearly twice the number of United
States publications.
*** Robert C. Stowe, An Annotated Bibliography of Publications
Dea ~ ing with Qua ~ itative and Quantitative Indicators of the
Qua 7 i ty of Science (Inc ~ uding a bib 7 iography on the access of
women to participation in scientific research} (1986~. In
addition to a list of core books, annotated entries are made
under the following headings:
I.
Bibliometric indicators of the quality of scientific
research
-Citations
-Critiques
-Citation Context Analvs; s
and publications as indicators of quality
of citation analysis
II. Qualitative approaches to and more general works on
research evaluation
III. Works dealing specifically with "science indicators"
IV. Forecasting and research priorities
V. Peer review
VI. Quality and quantity in the history of science and
philosophy
VII. Education
VIII. Issues involving quantity and quality in particular
disciplines, including papers on social indicators
IX. Sociology of science
X. Methodological papers and bibliographies
XI. Access of women to participation in scientific research
74
Representative terms from entire chapter:
peer assessment