| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 13
Methodology
Quality . .
is, yet you don't know
~ - But some
~ , . you know what it
what it is. But that's self-contradictory.
things are better than others, that is, they have more
quality. But when you try to say what the quality is,
apart from the things that have it, it all goes poor'
There's nothing to talk about. But if you can't say
what Quality is, how do you know what it is, or how do
you know that it even exists? If no one knows what it
is, then for all practical purposes it doesn't exist
at all. But for all practical purposes it really does
exist.
What else are the grades based on? Why else
would people pay fortunes for some things and throw
others in the trash pile? Obviously some things are
better than others . . . but what's the "bitterness"?
. . . So round and round you go, spinning mental
wheels and nowhere finding anyplace to get traction.
What the hell is Quality? What is it?
Robert M. Pirsig
Zen and the Art of
,
Motorcycle Maintenance
Both the planning committee and our own study committee have given
careful consideration to the types of measures to be employed in the
assessment of research-doctorate programs.) The committees
recognized that any of the measures that might be used is open to
criticism and that no single measure could be expected to provide an
entirely satisfactory index of the quality of graduate education.
With respect to the use of multiple criteria in educational
assessment, one critic has commented:
HA description of the measures considered may be found in the third
chapter of the planning committee's report, along with a discussion of
the relative merits of each measure.
13
OCR for page 14
14
At best each is a partial measure encompassing a
fraction of the large concept. On occasion its link
to the real [world] is problematic and tenuous.
Moreover, each measure [may contain] a load of
irrelevant superfluities, "extra baggage" unrelated to
the outcomes under study. By the use of a number of
such measures, each contributing a different facet of
information, we can limit the effect of irrelevancies
and develop a more rounded and truer picture of
program outcomes.2
Although the use of multiple measures alleviates the criticisms
directed at a single dimension or measure, it certainly will not
satisfy those who believe that the quality of graduate programs cannot
be represented by quantitative estimates no matter how many dimensions
they may be intended to represent. Furthermore, the usefulness of the
assessment is dependent on the validity and reliability of the
criteria on which programs are evaluated. The decision concerning
which measures to adopt in the study was made primarily on the basis
of two factors:
(1) the extent to which a measure was judged to be
related to the quality of research-doctorate
programs, and
(2) the feasibility of compiling reliable data for
making national comparisons of programs in
particular disciplines.
Only measures that were applicable to a majority of the disciplines to
be covered were considered. In reaching a final decision the study
committee found the ETS study, 3 in which 27 separate variables were
examined, especially helpful, even though it was recognized that many
of the measures feasible in institutional self-studies would not be
available in a national study. The committee was aided by the many
suggestions received from university administrators and others within
the academic community.
Although the initial design called for an assessment based on
approximately six measures, the committee concluded that it would be
highly desirable to expand this effort. A total of 16 measures
(listed in Table 2.1) have been utilized in the assessment of
research-doctorate programs in chemistry, computer sciences, geo-
sciences, mathematics, and physics; 15 of these were used in evaluating
programs in statistics/biostatistics. (Data on research expenditures
are unavailable in the latter discipline.) For nine of the measures
2C. H. Weiss, valuation Research: Methods of Assessing Program
Effectiveness, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1972,
p. 56.
3 See M. J. Clark et al. (1976) for a description of these variables.
OCR for page 15
15
TABLE 2.1 Measures Compiled on Individual Research-Doctorate Programs
Program Size1
01 Reported number of faculty members in the program, December 1980.
02 Reported number of program graduates in last five years (July 1975
through June 1980).
03 Reported total number of full-time and part-time graduate students
enrolled in the program who intend to earn doctorates, December 1980.
Characteristics of Graduates2
04 Fraction of FY1975-79 program graduates who had received some national
fellowship or training grant support during their graduate education.
05 Median number of years from first enrollment in graduate school to
receipt of the doctorate--FY1975-79 program graduates · 3
06 Fraction of FY1975-79 program graduates who at the time they completed
requirements for the doctorate reported that they had made definite
commitments for postgraduation employment.
07 Fraction of FY197S-79 program graduates who at the time they completed
requirements for the doctorate reported that they had made definite com-
mitments for postgraduation employment in Ph.D.-granting universities.
Reputational Survey Results4
08 Mean rating of the scholarly quality of program faculty.
09 Mean rating of the effectiveness of the program in educating research
scholars/scientists
10 Mean rating of the improvement in program quality in the last five years.
11 Mean rating of the evaluators' familiarity with the work of program
faculty.
University Library Sizes
12 Composite index describing the library size in the university in
which the program is located, 1979-80.
Research Support
13 Fraction of program faculty members holding research grants from the
National Science Foundation; National Institutes of Health; or the
Alcohol, Drug Abuse, and Mental Health Administration at any time
during the FY1978-80 period .6
14 Total expenditures (in thousands of dollars) reported by the university
for research and development activities in a specified field, FY1979. 7
Publication Records
15 Number of published articles attributed to the program, 1978-79.
16 Estimated "overall influencer of published articles attributed to the
program, 1978-79.
iBased on information provided to the committee by the participating
· . .
universities.
2 Based on data compiled in the NRC's Survey of Earned Doctorates.
3 In reporting standardized scores and correlations with other variables, a
shorter time-to-Ph.D. is assigned a higher score.
4 Based on responses to the committee's survey conducted in April 1981.
5 Based on data compiled by the Association of Research Libraries.
6 Based on matching faculty names provided by institutional coordinators with
the names
7 Based on
abased on
developed
of research grant awardees from the three federal agencies.
data provided to the National Science Foundation by universities.
data compiled by the Institute for Scientific Information and
by Computer Horizons, Inc.
OCR for page 16
16
data are available describing most, if not all, of the mathematical
and physical science programs included in the assessment. For seven
measures the coverage is less complete but encompasses at least a
majority of the programs in every discipline. The actual number of
programs evaluated on every measure is reported in the second table in
each of the next six chapters.
The 16 measures describe a variety of aspects important to the
operation and function of research-doctorate programs--and thus are
relevant to the quality and effectiveness of programs in educating
scientists for careers in research. However, not all of the measures
may be viewed as "global indices of quality." Some, such as those
relating to program size, are best characterized as "program
descriptors" which, although not dimensions of quality per se, are
thought to have a significant influence on the effectiveness of
programs. Other measures, such as those relating to university
library size and support for research and training, describe some of
the resources generally recognized as being important in maintaining a
vibrant program in graduate education. Measures derived from surveys
of faculty peers or from the publication records of faculty members,
on the other hand, have traditionally been regarded as indices of the
overall quality of graduate programs. Yet these too are not true
measures of quality.
We often settle for an easy-to-gather statistic,
perfectly legitimate for its own limited purposes, and
then forget that we haven't measured what we want to
talk about. Consider,-for instance, the reputation
approach of ranking graduate departments: We ask a
sample of physics professors (say) which the best
physics departments are and then tabulate and report
the results. The "best" departments are those that
our respondents say are the best. Clearly it is
useful to know which are the highly regarded depart-
ments in a given field, but prestige (which is what we
are measuring here) isn't exactly the same as
Quality · 4
To be sure, each of the 16 measures reported in this assessment has
its own set of limitations. In the sections that follow an
explanation is provided of how each measure has been derived and its
particular limitations as a descriptor of research-doctorate programs.
PROGRAM SIZE
Information was collected from the study coordinators at each
university on the names and ranks of program faculty, doctoral student
4 John Shelton Reed, "How Not to Measure What a University Does," The
Chronicle of Higher Education, Vol 22, No. 12, May 11, 1981, p. 56.
OCR for page 17
17
enrollment, and number of Ph.D. graduates in each of the past five
years (FY1976-80~. Each coordinator was instructed to include on the
faculty list those individuals who, as of December 1, 1980, held
academic appointments (typically at the rank of assistant, associate,
and full professor) and who participated significantly in doctoral
education. Emeritus and adjunct members generally were not to be
included. Measure 01 represents the number of faculty identified in a
program. Measure 02 is the reported number of graduates who earned
Ph.D. or equivalent research doctorates in a program during the period
from July 1, 1975, through June 30, 1980. Measure 03 represents the
total number of full-time and part-time students reported to be
enrolled in a program in the fall of 1980, who intended to earn
research doctorates. All three of these measures describe different
aspects of program size. In previous studies program size has been
shown to be highly correlated with the reputational ratings of a
program, and this relationship is examined in detail in this report.
It should be noted that since the information was provided by the
institutions participating in the study, the data may be influenced by
the subjective decisions made by the individuals completing the
forms. For example, some institutional coordinators may be far less
restrictive than others in deciding who should be included on the list
of program faculty. To minimize variation in interpretation, detailed
instructions were provided to those filling out the forms.S Measure
03 is of particular concern in this regard since the coordinators at
some institutions may not have known how many of the students currently
enrolled in graduate study intended to earn doctoral degrees.
CHARACTERISTICS OF GRADUATES
One of the most meaningful measures of the success of a research-
doctorate program is the performance of its graduates. How many go on
to lead productive careers in research and/or teaching? Unfortunate-
ly, reliable information on the subsequent employment and career
achievements of the graduates of individual programs is not available.
In the absence of this directly relevant information, the committee
has relied on four indirect measures derived from data compiled in the
NRC's Survey of Earned Doctorates.6 Although each measure has
serious limitations (described below), the committee believes it more
desirable to include this information than not to include data about
program graduates.
In identifying program graduates who had received their doctorates
in the previous five years (FY1975-79),7 the faculty lists furnished
5A copy of the survey form and instructions sent to study
coordinators is included in Appendix A.
6 A copy of the questionnaire used in this survey is found in
Appendix B.
7 Survey data for the FY1980 Ph.D. recipients had not yet been
compiled at the time this assessment was undertaken.
OCR for page 18
18
by the study coordinators at universities were compared with the names
of dissertation advisers (available from the NRC survey). The latter
source contains records for virtually all individuals who have earned
research doctorates from U.S. universities since 1920. The institu-
tion, year, and specialty field of Ph.D. recipients were also used in
determining the identity of program graduates. It is estimated that
this matching process provided information on the graduate training and
employment plans of more than 90 percent of the FY1975-79 graduates
from the mathematical and physical science programs. In the calcula-
tion of each of the four measures derived from the NRC survey, program
data are reported only if the survey information is available on at
least 10 graduates. Consequently, in the disciplines with smaller
programs--computer sciences and statistics/biostatistics--only slightly
more than half the programs are included in these measures, whereas
more than 90 percent of the chemistry and physics programs are
included.
Measure 04 constitutes the fraction of FY1975-79 graduates of a
program who had received at least some national fellowship support,
including National Institutes of Health (NIH) fellowships or
traineeships, National Science Foundation (NSF) fellowships, other
federal fellowships, Woodrow Wilson fellowships, or fellowships/
traineeships from other U.S. national organizations. One might expect
the more selective programs to have a greater proportion of students
with national fellowship support--especially "portable fellowships."
Although the committee considered alternative measures of student
ability (e.g., Graduate Record Examination scores, undergraduate grade
point averages), reliable information of this sort was unavailable for
a national assessment. It should be noted that the relevance of the
fellowship measure varies considerably among disciplines. In the
biomedical sciences a substantial fraction of the graduate students
are supported by training grants and fellowships; in the mathematical
and physical sciences the majority are supported by research
assistantships and teaching assistantships.
Measure 05 is the median number of years elapsed from the time
program graduates first enrolled in graduate school to the time they
received their doctoral degrees. For purposes of analysis the
committee has adopted the conventional wisdom that the most talented
students are likely to earn their doctoral degrees in the shortest
periods of time--hence, the shorter the median time-to-Ph.D., the
higher the standardized score that is assigned. Although this measure
has frequently been employed in social science research as a proxy for
student ability, one must regard its use here with some skepticism.
It is quite possible that the length of time it takes a student to
complete requirements for a doctorate may be significantly affected by
the explicit or implicit policies of a university or department. For
example, in certain cases a short time-to-Ph.D. may be indicative of
less stringent requirements for the degree. Furthermore, previous
studies have demonstrated that women and members of minority groups,
for reasons having nothing to do with their abilities, are more likely
than male Caucasians to interrupt their graduate education or to be
OCR for page 19
19
enrolled on a part-time basis.. As a consequence, the median
time-to-Ph.D. may be longer for programs with larger fractions of
women and minority students.
Measure 06 represents the fraction of FY 1975-79 program graduates
who reported at the time they had completed requirements for the
doctorate that they had signed contracts or made firm commitments for
postgraduation employment (including postdoctoral appointments as well
as other positions in the academic or nonacademic sectors) and who
provided the names of their prospective employers. Although this
measure is likely to vary by discipline according to the availability
of employment opportunities, a program's standing relative to other
programs in the same discipline should not be affected by this
variation. In theory, the graduates with the greatest promise should
have the easiest time finding jobs. However, the measure is also
influenced by a variety of other factors, such as personal job
preferences and restrictions in geographic mobility, that are
unrelated to the ability of the individual. It also should be noted
parenthetically that unemployment rates for doctoral recipients are
quite low and that nearly all of the graduates seeking jobs find
positions soon after completing their doctoral programs.9
Furthermore, first employment after graduation is by no means a
measure of career achievement, which is what one would like to have if
reliable data were available.
Measure 07, a variant of measure 06, constitutes the fraction of
FY1975-79 program graduates who indicated that they had made firm
commitments for employment in Ph.D.-granting universities and who
provided the names of their prospective employers. This measure may
be presumed to be an indication of the fraction of graduates likely to
pursue careers in academic research, although there is no evidence
concerning how many of them remain in academic research in the long
term. In many science disciplines the path from Ph.D. to postdoctoral
apprenticeship to junior faculty has traditionally been regarded as
the road of success for the growth and development of research
talent. The committee is well aware, of course, that other paths,
such as employment in the major laboratories of industry and
government, provide equally attractive opportunities for growth.
Indeed, in recent years increasing numbers of graduates are entering
the nonacademic sectors. Unfortunately, the data compiled from the
NRC's Survey of Earned Doctorates do not enable one to distinguish
between employment in the top-flight laboratories of industry and
For a detailed analysis of this subject, see Dorothy M. Gilford and
Joan Snyder, Women and Minority Ph.D.'s in the 1970's: A Data Book,
National Academy of Sciences, Washington, D.C., 1977.
9 For new Ph.D. recipients in science and engineering the unemployment
rate has been less than 2 percent (see National Research Council,
Postdoctoral Appointments and Disappointments, National Academy Press,
Washington, D.C., 1981, p. 313~.
.
OCR for page 20
20
TABLE 2.2 Percentage of FY1975-79 Doctoral Recipients with Definite
Commitments for Employment Outside the Academic sector*
Chemistry
Computer Sciences
Geosciences
Mathematics
Physics
Statistics/Biostatistics
45
38
53
17
42
29
*Percentages are based on responses to the NRC's Survey of Earned
Doctorates by those who indicated that they had made firm commitments
for postgraduation employment and who provided the names of their
prospective employers. These percentages may be considered lower-bound
estimates of the actual percentages of doctoral recipients employed
outside the academic sector.
government and employment in other areas of the nonacademic sectors.
Accordingly, the committee has relied on a measure that reflects only
the academic side and views this measure as a useful and interesting
program characteristic rather than a dimension of quality. In
disciplines such as geosciences, chemistry, physics, and computer
sciences, in which more than one-third of the graduates take jobs
outside the academic environs {see Table 2.2), this limitation is of
particular concern.
The inclusion of measures 06 and 07 in this assessment has been an
issue much debated by members of the committee; the strenuous
objections of three committee members regarding the use of these
measures are expressed in the Minority Statement that follows Chapter
IX.
REPUTATIONAL SURVEY RESULTS
In April 1981, survey forms were mailed to a total of 1,788
faculty members in chemistry, computer sciences, geosciences,
mathematics, physics, and statistics/biostatistics. The evaluators
were selected from the faculty lists furnished by the study coordi-
nators at the 228 universities covered in the assessment. These
evaluators constituted approximately 13 percent of the total faculty
population--13,661 faculty members--in the mathematical and physical
science programs being evaluated (see Table 2.3~. The survey sample
was chosen on the basis of the number of faculty in a particular
program and the number of doctorates awarded in the previous five
years (FY1976-80~--with the stipulation that at least one evaluator
was selected from every program covered in the assessment. In
selecting the sample each faculty rank was represented in proportion
to the total number of individuals holding that rank, and preference
was given to those faculty members whom the study coordinators had
OCR for page 21
21
nominated to serve as evaluators. As shown in Table 2.3, 1,461
individuals, 82 percent of the survey sample in the mathematical and
physical sciences, had been recommended by study coordinators.~°
Each evaluator was asked to consider a stratified random sample of
50 research-doctorate programs in his or her discipline--with programs
stratified by the number of faculty members associated with each
program. Every program was included on 150 survey forms. The 50
programs to be evaluated appeared on each survey form in random
sequence, preceded by an alphabetized list of all programs in that
discipline that were being included in the study. No evaluator was
asked to consider a program at his or her own institution. Ninety
percent of the survey sample group were provided the names of faculty
members in each of the 50 programs to be evaluated, along with data on
the total number of doctorates awarded in the last five years.
The inclusion of this information represents a significant departure
from the procedures used in earlier reputational assessments. For
purposes of comparison with previous studies, 10 percent (randomly
selected in each discipline) were not furnished any information other
than the names of the programs.
The survey items were adapted from the form used in the Roose-
Andersen study. Prior to mailing, the instrument was pretested using
a small sample of faculty members in chemistry and psychology. As a
result, two significant improvements were made in the original survey
design. A question was added on the extent to which the evaluator was
familiar with the work of the faculty in each program. Responses to
this question, reported as measure 11, provide some insight into the
relationship between faculty recognition and the reputational standing
of a program. 2 Also added was a question on the evaluator's field
of specialization--thereby making it possible to compare program evalu-
ations in different specialty areas within a particular discipline.
A total of 1,155 faculty members in the mathematical and physical
sciences--6S percent of those asked to participate--completed and
returned survey forms (see Table 2.3~. Two factors probably have
contributed to this response rate being approximately 14 percentage
points below the rates reported in the Cartter and Roose-Andersen
studies. First, because of the considerable expense of printing
individualized survey forms (each 25-30 pages), second copies were not
sent to sample members not responding to the first mailings 3 --as was
MA detailed analysis of the survey participants in each discipline
is given in subsequent chapters.
~iThis information was furnished to the committee by the study
coordinators at the universities participating in the study.
~ 2 Evidence of the strength of this relationship is provided by
correlations presented in Chapters III-VIII, and an analysis of the
relationship is provided in Chapter IX.
~ 3 A follow-up letter was sent to those not responding to the first
mailing, and a second copy was distributed to those few evaluators who
specifically requested another form.
OCR for page 22
22
done in the Cartter and Roose-Andersen efforts. Second, it is quite
apparent that within the academic community there has been a growing
dissatisfaction in recent years with educational assessments based on
reputational measures. Indeed, this dissatisfaction was an important
factor in the Conference Board's decision to undertake a multidimensional
assessment, and some faculty members included in the sample made known to
the committee their strong objections to the reputational survey.
TABLE 2.3 Survey Response by Discipline and Characteristics
of Evaluator
Total
Program
Faculty
N
Discipline of Evaluator
Survey Sample
Total Respondents
N N %
Chemistry3,339435301 69
Computer Sciences923174108 62
Geosciences1,419273177 65
Mathematics3,784348223 64
Physics3,399369211 57
Statistics/Biostatistics797189135 71
Faculty Rank
Professor8,1331,090711 65
Associate Professor3,225471293 62
Assistant Professor2,120216143 66
Other183118 73
Evaluator Selection
Nominated by Institution3,7511,461971 66
Other9,910327184 56
Survey Form
With Faculty NamesN/A*1,6091,033 64
Without NamesN/A179122 68
Total All Fields13,6611,7881,155 6S
*Not applicable.
OCR for page 23
23
As can be seen in Table 2.3, there is some variation in the
response rates in the six mathematical and physical science disci-
plines. Of particular interest is the relatively high rate of
response from chemists and the low rate from physicists--a result
consistent with the findings in the Cartter and Roose-Andersen
surveys. 4 It is not surprising to find that the evaluators
nominated by study coordinators responded more often than did those
who had been selected at random. No appreciable differences were
found among the response rates of assistant, associate, and full
professors or between the rates of those evaluators who were furnished
the abbreviated survey form (without lists of program faculty) and
those who were given the longer version.
Each program was considered by an average of approximately 90
survey respondents from other programs in the same discipline. The
evaluators were asked to judge programs in terms of scholarly quality
of program faculty, effectiveness of the program in educating research
scholars/scientists, and change in program quality in the last five
years. The mean ratings of a program on these three survey items
constitute measures 08, 09, and 10. Evaluators were also asked to
indicate the extent to which they were familiar with the work of the
program faculty. The average of responses to this item constitutes
measure 11.
In making judgments about the quality of faculty, evaluators were
instructed to consider the scholarly competence and achievements of
the individuals. The ratings were furnished on the following scale:
~ Distinguished
4 Strong
3 Good
2 Adequate
1 Marginal
O Not sufficient for doctoral education
X Don't know well enough to evaluate
In assessing the effectiveness of a program, evaluators were asked to
consider the accessibility of faculty, the curricula, the instructional
and research facilities, the quality of the graduate students, the
performance of graduates, and other factors that contribute to a
program's effectiveness. This measure was rated accordingly:
3 Extremely effective
2 Reasonably effective
1 Minimally effective
O Not effective
X Don't know well enough to
~ 4 To compare the response rates obtained in the earlier surveys, see
Roose and Andersen, Table 28, p. 29.
Use copy of the survey instrument and accompanying instructions are
included in Appendix C.
OCR for page 24
24
Evaluators were instructed to assess change in program quality on the
basis of whether there was an improvement in the last five years in
both the scholarly quality of the faculty and the effectiveness in
educating research scholars/scientists. The following alternatives
were provided:
2 Better than five years ago
1 Little or no change in last five years
O Poorer than five years ago
X Don't know well enough to evaluate
Evaluators were asked to indicate their familiarity with the work of
the program faculty according to the following scale:
2 Considerable familiarity
1 Some familiarity
O Little or no familiarity
In the computation of mean ratings on measures 08, 09, and 10, the
"don't know" responses were ignored. An average program rating based
on fewer than 15 responses (excluding the "don't know" responses) is
not reported.
Measures 08, 09, and 10 are subject to many of the same criticisms
that have been directed at previous reputational surveys. Although
care has been taken to improve the sampling design and to provide
evaluators with some essential information about each program, the
survey results merely reflect a consensus of faculty opinions. As
discussed in Chapter I, these opinions may well be based on
out-of-date information or be influenced by a variety of factors
unrelated to the quality of the program. In Chapter IX a number of
factors that may possibly affect the survey results are examined. In
addition to these limitations, it should be pointed out that the
evaluators, on the average, were unfamiliar with almost one-third of
the programs they were asked to consider. 6 As might be expected,
the smaller and less prestigious programs were not as well known, and
for this reason one might have less confidence in the average ratings
of these programs. For all four survey measures, standard errors of
the mean ratings are reported; they tend to be larger for the lesser
known programs. The frequency of response to each of the survey items
is discussed in Chapter IX.
Two additional comments should be made regarding the survey
activity. First, it should be emphasized that the ratings derived
from the survey reflect a program's standing relative to other
programs in the same discipline and provide no basis for making
cross-disciplinary comparisons. For example, the fact that a much
larger number of chemistry programs received "distinguished" ratings
on measure 08 than did computer science programs indicates nothing
~ 6 See Table 9.6 in Chapter IX.
OCR for page 25
25
about the relative quality of faculty in these two disciplines. It
may depend, in part, on the total numbers of programs evaluated in
these disciplines; in the survey instructions it was suggested to
evaluators that no more than 10 percent of the programs listed be
designated as "distinguished." Nor is it advisable to compare the
rating of a program in one discipline with that of a program in
another discipline because the ratings are based on the opinions of
different groups of evaluators who were asked to judge entirely
different sets of programs. Second, early in the committee's
deliberations a decision was made to supplement the ratings obtained
from faculty members with ratings from evaluators who hold research-
oriented positions in institutions outside the academic sector. These
institutions include industrial research laboratories, government
research laboratories, and a variety of other research establishments.
Over the past 10 years increasing numbers of doctoral recipients have
taken positions outside the academic setting. The extensive
involvement of these graduates in nonacademic employment is reflected
in the percentages reported in Table 2.2: An average of 40 percent of
the recent graduates in the mathematical and physical science
disciplines who had definite employment plans indicated that they
planned to take positions in nonacademic settings. Data from another
NRC survey suggest that the actual fraction of scientists employed
outside academia may be significantly higher. The committee
recognized that the inclusion of nonacademic evaluators would furnish
information valuable for assessing nontraditional dimensions of
doctoral education and would provide an important new measure not
assessed in earlier studies. Results from a survey of this group
would provide an interesting comparison with the results obtained from
the survey of faculty members. A concentrated effort was made to
obtain supplemental funding for adding nonacademic evaluators in
selected disciplines to the survey sample, but this effort was
unsuccessful. The committee nevertheless remains convinced of the
importance of including evaluators from nonacademic research
institutions. These institutions are likely to employ increasing
fractions of graduates in many disciplines, and it is urged that this
group not be overlooked in future assessments of graduate programs.
UNIVERSITY LIBRARY SI BE
The university library holdings are generally regarded as an
important resource for students in graduate (and undergraduate)
education. The Association of Research Libraries {ARL) has compiled
data from its academic member institutions and developed a composite
measure of a university library's size relative to those of other ARL
members. The ARL Library Index, as it is called, is based on 10
characteristics: volumes held, volumes added (gross), microform units
held, current serials received, expenditures for library materials,
expenditures for binding, total salary and wage expenditures, other
operating expenditures, number of professional staff, and number of
OCR for page 26
26
nonprofessional staff. 7 The 1979-80 index, which constitutes
measure 12, is available for 89 of the 228 universities included in
the assessment. {These 89 tend to be among the largest institutions.)
The limited coverage of this measure is a major shortcoming. It
should be noted that the ARL index is a composite description of
library size and not a qualitative evaluation of the collections,
services, or operations of the library. Also, it is a measure of
aggregate size and does not take into account the library holdings in
a particular department or discipline. Finally, although universities
with more than one campus were instructed to include figures for the
main campus only, some in fact may have reported library size for the
entire university system. Whether this misreporting occurred is not
known.
RESEARCH SUPPORT
Using computerized data filets provided by the National Science
Foundation (NSF) and the National Institutes of Health (NIH), it was
possible to identify which faculty members in each program had been
awarded research grants during the FY1978-80 period by either of these
agencies or by the Alcohol, Drug Abuse, and Mental Health Administra-
tion (ADAMHA). The fraction of faculty members in a program who had
received any research grants from these agencies during this three-year
period constitutes measure 13. Since these awards have been made on
the basis of peer judgment, this measure is considered to reflect the
perceived research competence of program faculty. However, it should
be noted that significant amounts of support for research in the
mathematical and physical sciences come from other federal agencies as
well, but it was not feasible to compile data from these other
sources. It is estimated that 55 percent of the university
faculty members in these disciplines who received federal R&D funding
obtained their support from NSF and another 19 percent from NIH. The
remaining 26 percent received support from the Department of Energy,
Department of Defense, National Aeronautics and Space Administration,
and other federal agencies. It also should be pointed out that only
those faculty members who served as principal investigators or
co-investigators are counted in the computation of this measure.
Measure 14 describes the total FY1979 expenditures by a university
for R&D in a particular discipline. These data have been furnished to
the NS~ ° by universities and include expenditures of funds from
both federal and nonfederal sources. If an institution has more than
one program being evaluated in the same discipline, the aggregate
university expenditures for research in that discipline are reported
7 See Appendix D for a description of the calculation of this index.
IDA description of these files is provided in Appendix E.
~9 Based on special tabulations of data from the NRC's Survey of
Doctorate Recipients, 1979.
2 °A copy of the survey instrument used to collect these data appears
in Appendix E.
OCR for page 27
27
for each of the programs. In each discipline data are recorded for
the 100 universities with the largest R&D expenditures. As already
mentioned, these data are not available for statistics and
biostatistics programs.
This measure has several limitations related to the procedures by
which the data have been collected. The committee notes that there is
evidence within the source documents that universities employ
varying practices for categorizing and reporting expenditures.
Apparently, institutional support of research, industrial support of
research, and expenditure of indirect costs are reported by different
institutions in different categories (or not reported at all). Since
measure 14 is based on total expenditures from all sources, the data
used here are perturbed only when these types of expenditures are not
subsumed under any reporting category. Also, it should be noted that
the data being attributed to geosciences programs include university
expenditures in all areas of the environmental sciences (geological
sciences, atmospheric sciences, and oceanography), and the data for
mathematics programs include expenditures in statistics as well as
mathematics. In contrast with measure 13, measure 14 is not reported
on a scale relative to the number of faculty members and thus reflects
the overall level of research activity at an institution in a
particular discipline. Although research grants in the sciences and
engineering provide some support for graduate students as well, these
measures should not be confused with measure 04, which pertains to
fellowships and training grants.
PUBLICATION RECORDS
Data from the 1978 and the 1979 Science Citation Index have been
compiled22 on published articles associated with research-doctorate
programs. Publication counts were associated with programs on the
basis of the discipline of the journal in which an article appeared
and the institution with which the author was affiliated. Coauthored
articles were proportionately attributed to the institutions of the
individual authors. Articles appearing in multidisciplinary journals
(e.g., Science, Nature) were apportioned according to the
characteristic mix of subject matter in those journals. For the
purposes of assigning publication counts, this mix can be estimated
with reasonable accuracy.2 3
2 National Science Foundation, Academic Science: R and D Funds,
Fiscal Year 1979, U.S. Government Printing Office, Washington, D.C.,
NSF 81-301, 1981.
2 2 The publication data have been generated for the committee's use
by Computer Horizons, Inc., using source files provided by the
Institute for Scientific Information.
2 3 Francis Narin, Evaluative Bi
and Citations Analysis in the Evaluation of scientific Activity'
Report to the National Science Foundation, March 1976, p. 203.
OCR for page 28
28
Two measures have been derived from the publication records:
measure 15--the total number of articles published in the 1978-79
period that have been associated with a research-doctorate program and
measure 16--an estimation of the "influence" of these articles! The
latter is a product of the number of articles attributed to a program
and the estimated influence of the journals in which these articles
appeared. The influence of a journal is determined from the weighted
number of times, on the average, an article in that journal is
cited--with references from frequently cited journals counting more
heavily. A more detailed explanation of the derivation of these
measures is given in Appendix F. Neither measure 15 nor measure 16 is
based on actual counts of articles written only by program faculty.
However, extensive analysis of the ~influence" index in the fields of
physics, chemistry, and biochemistry has demonstrated the stability of
this index and the reliability associated with its use.24 Of
course, this does not imply that the measure captures subtle aspects
of publication ~influence. n It is of interest to note that indices
similar to measures 15 and 16 have been shown to be highly correlated
with the peer ratings of graduate departments compiled in the
Roose-Andersen study.2s
It must be emphasized that these measures encompass articles
(published in selected journals) by all authors affiliated with a
given university. Included therefore are articles by program faculty
members, students and research personnel, and even members of other
departments in that university who publish in those journals. More-
over, these measures do not take into account the differing sizes of
programs, and the measures clearly do depend on faculty size.
Although consideration was given to reporting the number of published
articles per faculty member, the committee concluded that since the
measure included articles by other individuals besides program faculty
members, the aggregate number of articles would be a more reliable
measure of overall program quality. It should be noted that if a
university had more than one program being evaluated in the same
discipline, it is not possible to distinguish the relative
contribution of each program. In such cases the aggregate university
data in that discipline were assigned to each program.
Since the data are confined to 1978-79, they do not take into
account institutional mobility of authors after that period. Thus,
articles by authors who have moved from one institution to another
since 1979 are credited to the former institution. Also, the
publication counts fail to include the contributions of faculty
members' publications in journals outside their primary discipline.
2 4 Narin, pp. 283-307.
2 s Richard C. Anderson, Francis Narin, and Paul McAllister,
"Publication Ratings Versus Peer Ratings of Universities," Journal of
the American Society for Information Science, March 1978, pp. 91-103,
and Lyle V. Jones, "The Assessment of Scholarship, n New Directions for
Program Evaluation, No. 6, 1980, pp. 1-20.
OCR for page 29
29
This point may be especially important for those programs with faculty
members whose research is at the intersection of several different
disciplines.
The reader should be aware of two additional caveats with regard
to the interpretation of measures 15 and 16. First, both measures are
based on counts of published articles and do not include books. Since
in the mathematical and physical sciences most scholarly contributions
are published as journal articles, this may not be a serious
limitation. Second, the "influences measure should not be interpreted
as an indicator of the impact of articles by individual authors.
Rather it is a measure of the impact of the journals in which articles
associated with a particular program have been published. Citation
counts, with all their difficulties, would have been preferable since
they are attributable to individual authors and they register the
impact of books as well as journal articles. However, the difficulty
and cost of assembling reliable counts of articles by individual
author made their use infeasible.
ANALYS IS AND PRESENTATION OF THE DATA
The next six chapters present all of the information that has been
compiled on individual research-doctorate programs in chemistry,
computer sciences, geosciences, mathematics, physics, and statistics/
biostatistics. Each chapter follows a similar format, designed to
assist the reader in the interpretation of program data. The first
table in each chapter provides a list of the programs evaluated in a
discipline--including the names of the universities and departments or
academic units in which programs reside--along with the full set of
data compiled for individual programs. Programs are listed alphabeti-
cally according to name of institution, and both raw and standardized
values are given for all but one measure.2 6 For the reader's
convenience an insert of information from Table 2.1 is provided that
identifies each of the 16 measures reported in the table and indicates
the raw scale used in reporting values for a particular measure.
Standardized values, converted from raw values to have a mean of 50
and a standard deviation of 10, are computed for every measure so that
comparisons can easily be made of a program's relative standing on
different measures. Thus, a standardized value of 30 corresponds with
a raw value that is two standard deviations below the mean for that
measure, and a standardized value of 70 represents a raw value two
standard deviations above the mean. While the reporting of values in
standardized form is convenient for comparing a particular program's
standing on different measures, it may be misleading in interpreting
actual differences in the values reported for two or more programs-
26Since the scale used to compute measure 16--the estimated
"influence" of published articles--is entirely arbitrary, only
standardized values are reported for this measure.
OCR for page 30
30
especially when the distribution of the measure being examined is
highly skewed. For example, the numbers of published articles
(measure 15) associated with four chemistry programs are reported in
Table 3.1 as follows:
Program Raw Value
A
B
C
D
22
38
Standardized Value
-
37
38
41
43
Although programs C and D have many times the number of articles as
programs A and B. the differences reported on a standardized scale
appear to be small. Thus, the reader is urged to take note of the raw
values before attempting to interpret differences in the standardized
values given for two or more programs.
The initial table in each chapter also presents estimated standard
errors of mean ratings derived from the four survey items (measures
08-111. A standard error is an estimated standard deviation of the
sample mean rating and may be used to assess the stability of a mean
rating reported for a particular program.2 7 For example, one may
assert {with .95 confidence) that the population mean rating would lie
within two standard errors of the sample mean rating reported in this
assessment.
No attempt has been made to establish a composite ranking of
programs in a discipline. Indeed, the committee is convinced that no
single measure adequately reflects the quality of a research-doctorate
program and wishes to emphasize the importance of viewing individual
programs from the perspective of multiple indices or dimensions.
The second table in each chapter presents summary statistics
(i.e., number of programs evaluated, mean, standard deviation, and
defile value-) for each of the program measures.28 The reader
should find theme statistics helpful in interpreting the data reported
on individual programs. Next is a table of the intercorrelations
among the various measures for that discipline. This table should be
of particular interest to those desiring information about the
interrelations of the various measures.
27 m e standard error estimate has been computed by dividing the
standard deviation of a program's ratings by the square root of the
number of ratings. For a more extensive discussion of this topic, see
Fred N. Kerlinger, Foundations of Behavioral Research, Holt, Reinhart,
and Winston, Tnc., New York, 1973, Chapter 12e Readers should note
that the estimate is a measure of the variation in response and by no
means includes all possible sources of error.
28Standardized scores have been computed from precise values of the
mean and standard deviation of each measure and not the rounded values
reported in the second table of each chapter.
OCR for page 31
31
The remainder of each chapter is devoted to an examination of
results from the reputational survey. Included are an analysis of the
characteristics of survey participants and graphical portrayals of the
relationship of mean rating of scholarly quality of faculty (measure
08) with number of faculty (measure 01) and the relationship of mean
rating of program effectiveness (measure 09) with number of graduates
(measure 02~. A frequently mentioned criticism of the Roose-Andersen
and Cartter studies is that small but distinguished programs have been
penalized in the reputational ratings because they are not as highly
visible as larger programs of comparable quality. The comparisons of
survey ratings with measures of program size are presented as the
first two figures in each chapter and provide evidence about the
number of small programs in each discipline that have received high
reputational ratings. Since in each case the reputational rating is
more highly correlated with the square root of program size than with
the size measure itself, measures 01 and 02 are plotted on a square
root scale.29 To assist the reader in interpreting results of the
survey evaluations, each chapter concludes with a graphical
presentation of the mean rating for every program of the scholarly
quality of faculty (measure 08) and an associated "confidence
interval" of 1.5 standard errors. In comparing the mean ratings of
two programs, if their reported confidence intervals of 1.5 standard
errors do not overlap, one may safely conclude that the program
ratings are significantly different (at the .05 level of
significance)--i.e., the observed difference in mean ratings is too
large to be plausibly attributable to sampling error.30
The final chapter of this report gives an overview of the
evaluation process in the six mathematical and physical science
disciplines and includes a summary of general findings. Particular
attention is given to some of the extraneous factors that may
influence program ratings of individual evaluators and thereby distort
the survey results. The chapter concludes with a number of specific
suggestions for improving future assessments of research-doctorate
programs.
2 9 For a general discussion of transforming variables to achieve
linear fits, see John W. Tukey, Exploring Data Analysis, Addision
Wesley, Reading, Massachusetts, 1977.
3 °This rule for comparing nonoverlapping intervals is valid as long
as the ratio of the two estimated standard errors does not exceed
2.41. (The exact statistical significance of this criterion then lies
between .050 and .034.) Inspection of the standard errors reported in
each discipline shows that for programs with mean ratings differing by
less than 1.0 (on measure 08), the standard error of one mean very
rarely exceeds twice the standard error of another.
OCR for page 32
Representative terms from entire chapter:
program graduates