| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 159
lx
Hungary and Discussion
In the six preceding chapters results are presented of the
assessment of 596 research-doctorate programs in chemistry, computer
sciences, geosciences, mathematics, physics, and statistics/bio-
statistics. Included in each chapter are summary data describing the
means and intercorrelations of the program measures in a
. . .
particular
a~sc~p.,ne. In this chapter a comparison is made of the summary data
reported for the six disciplines. Also presented are an analysis of
the reliability (consistency) of the reputational survey ratings and
an examination of some factors that might possibly have influenced the
survey results. '~ ~r ~u~'uu~ we ~uyy=~`ur~
studies of this kind--with particular attention given to
measures one would like to have available for an assessment of
research-doctorate programs.
This chapter necessarily involves a detailed discussion of various
statistics (means, standard deviations, correlation coefficients)
describing the measures. Throughout, the reader should bear in mind
that all these statistics and measures are necessarily imperfect
attempts to describe the real quality of research-doctorate programs.
Quality and some differences in quality are real, but these differ-
ences cannot be subsumed completely under any one quantitative measure.
For example, no single numerical ranking--by measure 08 or by any
weighted average of measures--can rank the quality of different
programs with precision.
However, the evidence for reliability indicates considerable
stability in the assessment of quality. For instance, a program that
comes out in the first decile of a ranking is quite unlikely to
"really" belong in the third decile, or vice versa. If numerical
ranks of programs were replaced by groupings (distinguished, strong,
etc.), these groupings again would not fully capture actual
differences in quality since there would likely be substantial
ambiguity about the borderline between adjacent groups. Furthermore,
any attempt at linear ordering (best, next best, . . .) also may be
inaccurate. Programs of roughly comparable quality may be better in
different ways, so that there simply is no one best program--as will
also be indicated in some of the numerical analyses. However, these
difficulties of formulating ranks should not hide the underlying
me _ ~ a__ _~ ~~ -* ~~~ for improving
the types of
159
OCR for page 160
160
reality of differences in quality or the importance of high quality
for effective doctoral education.
SUMMARY OF THE RESULTS
Displayed in Table 9.1 are the numbers of programs evaluated
(bottom line) and the mean values for each measure in the six mathe-
matical and physical science disciplines. As can be seen, the mean
values reported for individual measures vary considerably among
disciplines. The pattern of means on each measure is summarized
below, but the reader interested in a detailed comparison of the
distribution of a measure should refer to the second table in each of
the preceding six chapters ·2
Program Size (Measures 01-031. Based on the information provided to
the committee by the study coordinator at each university, mathematics
programs had, on the average, the largest number of faculty members
{33 in December 1980), followed by physics (28) and chemistry (23~.
Chemistry programs graduated the most students (51 Ph.D. recipients in
the FY1975-79 period) and had the largest enrollment (75 doctoral
students in December 1980~. In contrast, statistics and biostatistics
programs were reported to have an average of only 12 faculty members,
15 graduates, and 22 doctoral students.
Program Graduates (Measures 04-07~. The mean fraction of FY1975-79
doctoral recipients who as graduate students had received some
national fellowship or training grant support (measure 04) ranges from
.17 for graduates of computer science programs to .32 for graduates in
statistics/biostatistics. (The relatively high figure for the latter
group may be explained by the availability of National Institutes of
Health (NIH) training grant support for students in biostatistics.)
With respect to the median number of years from first enrollment in a
graduate program to receipt of the doctorate (measure 051, chemistry
graduates typically earned their degrees more than half a year sooner
than graduates in any of the other disciplines. Graduates in physics
and geosciences report the longest median times to the Ph.D. In terms
of employment status at graduation (measure 06), an average of 80
percent of the Ph.D. recipients from computer science programs
reported that they had made firm job commitments by the time they had
completed the requirements for their degrees, contrasted with 61
percent of the program graduates in mathematics. A mean of 43 percent
of the statistics/biostatistics graduates reported that they had made
Means for measure 16, "influences of publication, are omitted since
arbitrary scaling of this measure prevents meaningful comparisons
across disciplines.
2 The second table in each of the six preceding chapters presents the
standard deviation and decile values for each measure.
OCR for page 161
161
TABLE 9.1 Mean Values for Each Program Measure, by Discipline
Computer Geo
Chemistry Sciences sciences Math
Statistics/
Physics Biostat.
Program Size
012316163328 12
025120192435 15
037541253556 22
Program Graduates
04.23.17.26.25.26 .32
055.96.57.06.67.1 6.7
06.76.80.77.61.66 .78
07.33.38.22.25.26 .43
Survey Results
082.52.52.92.72.7 2.8
091.61.51.81.61.7 1.6
101.11.11.11.21.1 1.1
11.9.9.9.8.7 .9
University Library
12.1.4.4.1.1 .5
Research Support
13.48.36.47.32.36 .25
141788117139966162943 NA
Publication Records
1578344439106 12
Total Programs1455891115123 64
OCR for page 162
162
firm commitments to take positions in Ph.D.-granting institutions
(measure 07), while only 22 percent of those in the geosciences had
made such plans. This difference may be due, to a great extent, to
the availability of employment opportunities for geoscientists outside
the academic sector.
Survey Results (Measures 08-111. Differences in the mean ratings
derived from the reputational survey are small. In all six
disciplines the mean rating of scholarly quality of program faculty
(measure 08) is slightly below 3.0 ("goods), and programs were judged
to be, on the average, a bit below Moderately effective (2.0) in
educating research scholars/scientists {measure 091. In the opinions
of the survey respondents, there has been "little or no change"
(approximately 1.0 on measure 10) in the last five years in the
overall average quality of programs. The mean rating of an
evaluator's familiarity with the work of program faculty (measure 11)
is close to 1.0 ("some familiarity in every discipline--about which
more will be said later in this chapter.
University Library (Measure 121. Measure 12, based on a composite
index of the sizes of the library at the university in which a
program resides, is calculated on a scale from -2.0 to 3.0, with means
ranging from .1 in chemistry, mathematics, and physics to .4 in
computer sciences and geosciences, and .5 in statistics/biostatistics.
These differences may be explained, in large part, by the number of
programs evaluated in each discipline. In the disciplines with the
fewest doctoral programs (statistics/biostatistics, computer sciences,
and geosciences), programs included are typically found in the larger
institutions, which are likely to have high scores on the library size
index. Ph.D. programs in chemistry, physics, and mathematics are
found in a much broader spectrum of universities that includes the
smaller institutions as well as the larger ones.
Research Support (Measures 13-141. Measure (13), the proportion of
program faculty who had received NSF, NIH, or ADAMHA4 research grant
awards during the FY1978-80 period, has mean values ranging from as
high as .48 and .47 in chemistry and geosciences, respectively, to .2S
in statistics/biostatistics. It should be emphasized that this
measure does not take into account research support that faculty
members have received from sources other than these three federal
3The index, derived by the Association of Research Libraries,
reflects a number of different measures, including number of volumes,
fiscal expenditures, and other factors relevant to the size of a
university library. See the description of this measure presented in
Appendix D.
Avery few faculty members in mathematical and physical science
programs received any research support from the Alcohol, Drug Abuse,
and Mental Health Administration.
OCR for page 163
163
agencies. In terms of total university expenditures for R&D in a
particular discipline (measure 14), the mean values are reported to
range from $616 , Coo in mathematics to 33, 996 , coo in the geosciences.
{R&D expenditure data are not available for statistics/biostatistics.)
The large differences in reported expenditures are likely to be related
to three factors: the differential availability of research support in
the six disciplines, the differential average cost of doing research,
and the differing numbers of individuals involved in a research effort.
Publication Records (Measures 15 and 161. Considerable diversity is
found in the mean number of articles associated with a research-
doctorate program (measure 15~. An average of 106 articles published
in the 1978-79 period is reported for programs in physics and 75
articles for programs in chemistry; in each of the other four
disciplines the mean number of articles is fewer than 40. These large
differences reflect both the program size in a particular discipline
(i.e., the total number of faculty and other staff members involved in
research) and the frequency with which scientists in that discipline
publish; it may also depend on the length of a typical paper in a
discipline. Mean scores are not reported on measure 16, the estimated
"overall influence" of the articles attributed to a program. Since
this measure is calculated from an average of journal influence
weights, 5 normalized for the journals covered in a particular
discipline, mean differences among disciplines are uninterpretable.
Correlations with Measure 02. Relations among the program measures
. .
are of intrinsic interest and are relevant to the issue of validity of
the measures as indices of the quality of a research-doctorate
program. Measures that are logically related to program quality are
expected to be related to each other. To the extent that they are, a
stronger case might be made for the validity of each as a quality
measure.
A reasonable index of the relationship between any two measures is
the Pearson product-moment correlation coefficient. A table of
correlation coefficients between all possible pairs of measures has
been presented in each of the six preceding chapters. In this chapter
selected correlations to determine the extent to which coefficients
are comparable in the six disciplines are presented. Special
attention is given to the correlations involving the number of
FY1975-79 program graduates (measure 02), the survey rating of the
scholarly quality of program faculty (measure 08), university R&D
expenditures in a particular discipline (measure 14), and the
influence-weighted number of publications (measure 16~.
Table 9.2 presents the correlations of measure 02 with each of the
other measures used in the assessment. As might be expected,
correlations of this measure with the other two measures of program
size--number of faculty (01) and doctoral student enrollment (03~--are
ssee Appendix F for a description of the derivation of this measure.
OCR for page 164
164
TABLE 9.2 Correlations of the Number of Program Graduates (Measure 02) with Other
Measures, by Discipline
Computer Geo
Chemistry Sciences
sciences Math
Statistics/
Physics Biostat.
Program Size
01 .68.62.42 .50.77.53
03 .92.52.72 .85.92.48
Program Graduates
04 .02.05-.01 .08-.02.00
05 .38-.07.29 .31.32.04
06 .23.12.05 .18.40.00
07 .13-.05.36 .46.41-.03
Survey Results
08 .83.66.64 .70.76.55
09 .81.68.67 .68.73.63
10 .23-.02.06 .01-.17.17
11 .83.61.67 .72.78.59
University Library
12 .61.44.43 .45.47.11
Research Support
13 .57.34.40 .35.13.06
14 .72.58.25 .41.66N/A
Publication Records
15 .83.85.73 .75.85.52
16 .86.84.74 .81.86.48
OCR for page 165
165
quite high in all six disciplines. Of greater interest are the strong
positive correlations between measure 02 and measures derived from
either reputational survey ratings or publication records. The
coefficients describing the relationship of measure 02 with measures
15 and 16 are greater than .70 in all disciplines except statistics/
biostatistics. This result is not surprising, of course, since both
of the publication measures reflect total productivity and have not
been adjusted for program size. The correlations of measure 02 with
measures 08, 09, and 11 are almost as strong. It is quite apparent
that the programs that received high survey ratings and with which
evaluators were more likely to be familiar were also ones that had
larger numbers of graduates. Although the committee gave serious
consideration to presenting an alternative set of survey measures that
were adjusted for program size, a satisfactory algorithm for making
such an adjustment was not found. In attempting such an adjustment on
the basis of the regression of survey ratings on measures of program
size, it was found that some exceptionally large programs appeared to
be unfairly penalized and that some very small programs received
unjustifiably high adjusted scores.
Measure 02 also has positive correlations in most disciplines with
measure 12, an index of university library size, and with measures 13
and 14, which pertain to the level of support for research in a
program. Of particular note are the moderately large coefficients--in
disciplines other than statistics/biostatistics and physics--for
measure 13, the fraction of faculty members receiving federal research
grants. Unlike measure 14, this measure has been adjusted for the
number of program faculty. The correlations of measure 02 with
measures 05, 06, and 07 are smaller but still positive in most of the
disciplines. From this analysis it is apparent that the number of
program graduates tends to be positively correlated with all other
variables except measure 04--the fraction of students with national
fellowship support. It is also apparent that the relationship of
measure 02 with the other variables tends to be weakest for programs
in statistics/biostatistics.
CoECelatio~s wit) Tee pp. Table 9.3 shows the correlation
coefficients for measure 08, the mean rating of the scholarly quality
of program faculty, with each of the other variables. The
correlations of measure 08 with measures of program size (01, 02, and
03) are .40 or greater for all six disciplines. Not surprisingly, the
larger the program, the more likely its faculty is to be rated high in
quality. However, it is interesting to note that in all disciplines
except statistics/biostatistics the correlation with the number of
program graduates (measure 02) is larger than that with the number of
faculty or the number of enrolled students.
Correlations of measure 08 with measure 04, the fraction of
students with national fellowship awards, are positive but close to
zero in all disciplines except computer sciences and mathematics. For
programs in the biological and social sciences, the corresponding
coefficients (not reported in this volume) are found to be greater,
typically in the range of .40 to .70. Perhaps in the mathematical and
OCR for page 166
166
TABLE 9.3 Correlations of the Survey Ratings of Scholarly Quality of Program Faculty
(Measure 08) with Other Measures, by Discipline
Computer Geo
Chemistry Sciences sciences Math
Statistics/
Physics Biostat.
Program Size
01 .64 .54 .45.48.68 .63
02 .83 .66 .64.70.76 .55
03 .81 .50 .61.64.75 .40
Program Graduates '
04 .11 .35 .08.30.15 .19
05 .47 .14 .50.57.42 .32
06 .28 .21 .24.19.42 .15
07 .30 .17 .58.63.58 .25
Survey Results
09 .98 .98 .97.98.96 .95
10 .35 .29 .29-.01-.15 .30
11 .96 .97 .87.96.96 .93
University Library
12 .66 .58 .58.65.67 .53
Research Support
13 .77 .59 .72.70.24 .53
14 .79 .63 .27.42.61 N/A
Publication Records
15 .80 .70 .75.75.85 .70
16 .86 .77 .77.83.86 .67
1
OCR for page 167
167
physical sciences, the departments with highly regarded faculty are
more likely to provide support to doctoral students as teaching
assistants or research assistants on faculty research grants--thereby
reducing dependency on national fellowships. (m e low correlation of
rated faculty quality with the fraction of students with national
fellowships is not, of course, inconsistent with the thesis that
programs with large numbers of students are programs with large
numbers of fellowship holders.)
Correlations of rated faculty quality with measure 05, shortness
of time from matriculation in graduate school to award of the
doctorate, are notably high for programs in mathematics, geosciences,
and chemistry and still sizeable for physics and statistics/bio-
statistics programs. Thus, those programs producing graduates in
shorter periods of time tended to receive higher survey ratings. This
finding is surprising in view of the smaller correlations in these
disciplines between measures of program size and shortness of
time-to-Ph.D. It seems there is a tendency for programs that produce
doctoral graduates in a shorter time to have more highly rated faculty,
and this tendency is relatively independent of the number of faculty
members.
Correlations of ratings of faculty quality with measure 06, the
fraction of program graduates with definite employment plans, are
moderately high in physics and somewhat lower, but still positive, in
the other disciplines. In every discipline except computer sciences
the correlation of measure 08 is higher with measure 07, the fraction
of graduates having agreed to employment at a Ph.D.-granting
institution. These coefficients are greater than .50 in mathematics,
geosciences, and physics.
The correlations of measure 08 with measure 09, the rated
effectiveness of doctoral education, are uniformly very high, at or
above .95 in every discipline. This finding is consistent with results
from the Cartter and Roose-Andersen studies .6 The coefficients
describing the relationship between measure 08 and measure 11,
familiarity with the work of program faculty, are also very high,
ranging from .87 to .97. In general, evaluators were more likely to
have high regard for the quality of faculty in those programs with
which they were most familiar. That the correlation coefficients are
as large as observed may simply reflect the fact that "known programs
tend to be those that have earned strong reputations.
Correlations of ratings of faculty quality with measure 10, the
ratings of perceived improvement in program quality, are near zero for
mathematics and physics programs and range from .29 to .35 in other
disciplines. One might have expected that a program judged to have
improved in quality would have been somewhat more likely to receive
high ratings on measure 08 than would a program judged to have
declined--thereby imposing a small positive correlation between these
two variables.
6 Roose and Andersen, p. 19.
OCR for page 168
168
Moderately high correlations are observed in most disciplines
between measure 08 and university library size (measure 12), support
for research (measures 13 and 14), and publication records (measures
15 and 16). With few exceptions these coefficients are .50 or greater
,
in all disciplines. Of particular note are the strong correlations
with the two publication measures--ranging from .70 to .86. In all
disciplines except statistics/biostatistics the correlations with
measure 16 are higher than those with measure 15; the "weighted
influences of journals in which articles are published yields an index
that tends to relate more closely to faculty reputation than does an
unadjusted count of the number of articles published. Although the
observed differences between the coefficients for measures 15 and 16
are not large, this result is consistent with earlier findings of
Anderson et al.7
Correlations with Measure 14. Correlations of measure 14, reported
.
dollars of support for R&D, with other measures are shown in Table
9.4. (Data on research expenditures in statistics/biostatistics are
not available.) The pattern of relations is quite similar for
programs in chemistry, computer sciences, and physics: moderately
high correlations with measures of program size and somewhat higher
correlations with both reputational survey results (except measure 10)
and publication measures. For programs in mathematics many of these
relations are positive but not as strong. For geoscience programs,
measure 14 is related more closely to faculty size (measure 01) than
to any other measure, and the correlations with rated quality of
faculty and program effectiveness are lower than in any other
discipline. In interpreting these relationships one must keep in mind
the fact that the research expenditure data have not been adjusted for
the number of faculty and other staff members involved in research in
a program.
Correlations with Measure 16. Measure 16 is the number of published
articles attributed to a program and adjusted for the "average
influence" of the journals in which the articles appear. The
correlations of this measure with all others appear in Table 9.~. Of
particular interest are the high correlations with all three measures
of program size and with the reputational survey results (excluding
measure 10~. Most of those coefficients exceed .70, although for
programs in statistics/biostatistics they are below this level.
Moderately high correlations are also observed between measure 16 and
measures 12, 13, and 14. With the exception of computer science
programs, the correlations between the adjusted publication measure
and measure 05, time-to-Ph.D., range from .31 to .41. It should be
pointed out that the exceptionally large coefficients reported for
measure 15 result from the fact that the two publication measures are
empirically as well as logically interdependent.
7Anderson et al., p. 95.
OCR for page 169
169
TABLE 9.4 Correlations of the University Research Expenditures in a Discipline
(Measure 14) with Other Measures, by Discipline
Computer Geo
Chemistry Sciences sciences Math
Statistics/
Physics Biostat.
Program Size
01 .43.44.61 .18.54 N/A
02 .72.58.25 .41.66 N/A
03 .66.43.28 .44.68 N/A
Program Graduates
04 .18.22.22 .29.04 N/A
05 .35-.21-.05 .17.31 N/A
06 .31-.03-.04 .23.25 N/A
07 .20-.16.06 .22.31 N/A
Survey Results
08 .79.63.27 .42.61 N/A
on .74.61.25 .42.61 N/A
10 .14-.02.13 -.12-.08 N/A
11 .77.64.18 .43.58 N/A
University Library
12 .45.16.33 .33.33 N/A
Research Support
13 .55.10.20 .18.07 N/A
Publication Records
15 .70.66.42 .35.80 N/A
16 .78.73.35 .42.80 N/A
r
OCR for page 182
182
TABLE 9.14 Mean Ratings of Scholarly Quality of Program Faculty,
by Evaluator's Institution of Highest Degree
MEAN RATINGS
Alumni Nonalumni
NUMBER OF PROGRAMS
WITH ALUMNI RATINGS
N
Chemistry 3.88 3.60 37
Computer Sciences 3.56 3.02 26
Geosciences 3.83 3.51 34
Mathematics 3.73 3.41 37
Physics 4.11 3.87 27
Statistics/Biostat. 3.90 3.32 35
NOTE: The pairs of means reported in each discipline are computed
for a subset of programs with a rating from at least one alumnus
and are substantially greater than the mean ratings for the full set
of programs in each discipline.
maters. Information collected in the survey on each evaluator's
institution of highest degree enables us to investigate this concern.
m e findings presented in Table 9.14 support the hypothesis that
alumni provided generous ratings--with differences in the mean ratings
(for measure 08) of alumni and nonalumni ranging from .24 to .58 in
the six disciplines. It is interesting to note that the largest
differences are found in statistics/biostatistics and computer
sciences, the disciplines with the fewest programs. Given the
appreciable differences between the ratings furnished by program
alumni and other evaluators, one might ask how much effect this has
had on the overall results of the survey. The answer is "very
little. As shown in the table, in chemistry and physics only one
program in every four received ratings from any alumnus; in
statistics/biostatistics slightly more than half of the programs were
evaluated by one or more alumni.l4 Ace- ;~ She 1~- ~;Q-;~1;~^
~. e ~
Novell `~1 ~= l"~=L U`~1C,
however, the fraction of alumni providing ratings of a program is
always quite small and should have had minimal impact on the overall
mean rating of any program. To be certain that this was the case,
mean ratings of the scholarly quality of faculty were recalculated for
every mathematical and physical science program--with the evaluations
provided by alumni excluded. The results were compared with the mean
scores based on a full set of evaluations. Out of the 592 mathemat-
ical and physical science programs evaluated in the survey, only 1
~4 Because of the small number of alumni ratings in every discipline,
the mean ratings for this group are unstable and therefore the correla-
tions between alumni and nonalumni mean ratings are not reported.
OCR for page 183
183
program (in geosciences) had an observed difference as large as 0.2,
and for 562 programs <95 percent) their mean ratings remain unchanged
(to the nearest tenth of a unit). On the basis of these findings the
committee saw no reason to exclude alumni ratings in the calculation
of program means.
Another concern that some critics have is that a survey evaluation
may be affected by the interaction of the research interests of the
evaluator and the areais) of focus of the research-doctorate program
to be rated. It is said, for example, that some narrowly focused
programs may be strong in a particular area of research but that this
strength may not be recognized by a large fraction of evaluators who
happen to be acknowledgeable in this area. This is a concern more
difficult to address than those discussed in the preceding pages since
little or no information is available about the areas of focus of the
programs being evaluated {although in certain disciplines the title of
a department or academic unit may provide a clue). To obtain a better
understanding of the extent to which an evaluator's field of specialty
~J _
may nave 1nrluencea one ratings ne or sne has provided, evaluators in
physics and in statistics/biostatistics were separated into groups
according to their specialty fields (as reported on the survey
questionnaire). In physics, Group A includes those specializing in
elementary particles and nuclear structure, and Group B is made up of
those in all other areas of physics. In statistics/biostatistics,
Group A consists of evaluators who designated biostatistics or
biomathematics as their specialty and Group B of those in all other
specialty areas of statistics. The mean ratings of the two groups in
each discipline are reported in Table 9.15. The program ratings
TABLE 9.15 Mean Ratings of Scholarly Quality of Program Faculty,
by Evaluator's Field of Specialty Within Physics
or Statistics/Biostatistics
PHYSICS: Group A includes evaluators in elementary particles
and nuclear structure; Group B includes those in atomic/
molecular, solid state, and other fields of physics.
STATISTICS/BIOSTATISTICS: Group A includes evaluators in bio-
statistics, biometrics, and epidemiology; Group B includes
those in all other fields of statistics.
MEAN RATINGS
Group A Group B
CORRELATION
r N
Physics 2.58 2.68 .95 122
Statistics/Biostat. 3.13 2.73 .93 63
NOTE: N reported in last column represents the number of programs
with a rating from at least one evaluator in each of the two groups.
OCR for page 184
184
supplied by evaluators in elementary particles and nuclear structure
are, on the average, slightly below those provided by other
physicists. The mean ratings of the biostatistics group are typically
higher than those of other statisticians. Despite these differences
there is a high degree of correlation in the mean ratings provided by
the two groups in each discipline. Although the differences in the
mean ratings of biostatisticians (Group A} and other statisticians
(Group B) are comparatively large, a detailed inspection of the
individual ratings reveals that biomedical evaluators rated programs
appreciably higher regardless of whether a program was located in a
department of biostatistics (and related fields) or in a department
outside the biomedical area. Although one cannot conclude from these
findings that an evaluator's specialty field has no bearing on how he
or she rates a program, these findings do suggest that the relative
standings of programs in physics and statistics/biostatistics would
not be greatly altered if the ratings by either group were discarded.
INTERPRETATION OF REPUTATIONAL SURVEY RATINGS
It is not hard to foresee that results from this survey will
receive considerable attention through enthusiastic and uncritical
reporting in some quarters and sharp castigation in others. The study
committee understands the grounds for both sides of this polarized
response but finds that both tend to be excessive. It is important to
make clear how we view these ratings as fitting into the larger study
of which they are a part.
The reputational results are likely to receive a disproportionate
degree of attention for several reasons, including the fact that they
reflect the opinions of a large group of faculty colleagues and that
they form a bridge with earlier studies of graduate programs. But the
results will also receive emphasis because they alone, among all of
the measures, seem to address quality in an overall or global
fashion. While most recognize that "objective" program characteristics
(i.e., publication productivity, research funding, or library size)
have some bearing on program quality, probably no one would contend
that a single one of these measures encompasses all that need be known
about the quality of research-doctorate programs. Each is obviously
no more than an indicator of some aspect of program quality. In
contrast, the reputational ratings are global from the start because
the respondents are asked to take into account many objective
characteristics and to arrive at a general assessment of the quality
of the faculty and effectiveness of the program. This generality has
self-evident appeal.
On the other hand, it is wise to keep in mind that these
reputational ratings are measures of Perceived program quality rather
than of "quality" in some ideal or absolute sense. What this means is
that, just as for all of the more objective measures, the reputational
OCR for page 185
185
ratings represent only a partial view of what most of us would con-
sider quality to be; hence, they must be kept in careful perspective.
Some critics may argue that such ratings are positively misleading
because of a variety of methodological artifacts or because they are
supplied by "judges" who often know very little about the programs
they are rating. The committee has conducted the survey in a way that
permits the empirical examination of a number of the alleged artifacts
and, although our analysis is by no means exhaustive, the general
conclusion is that their effects are slight.
Among the criticisms of reputational ratings from prior studies
are some that represent a perspective that may be misguided. This
perspective assumes that one asks for ratings in order to find out
what quality really is and that to the degree that the ratings miss
the mark of "quintessential quality, they are unreal, although the
quality that they attempt to measure is real. What this perspective
misses is the reality of quality and the fact that impressions of
quality, if widely shared, have an imposing reality of their own and
therefore are worth knowing about in their own right. After all,
these perceptions govern a large-scale system of traffic around the
nation's graduate institutions--for example, when undergraduate
students seek the advice of professors concerning graduate programs
that they might attend. It is possible that some professors put in
this position disqualify themselves on grounds that they are not well
informed about the relative merits of the programs being considered.
Most faculty members, however, surely attempt to be helpful on the
basis of impressions gleaned from their professional experience, and
these assessments are likely to have major impact on student
decision-making. In short, the impressions are real and have very
real effects not only on students shopping for graduate schools but
also on other flows, such as job-seeking young faculty and the
distribution of research resources. At the very least, the survey
results provide a snapshot of these impressions from discipline to
discipline. Although these impressions may be far from ideally
informed, they certainly show a strong degree of consensus within each
discipline, and it seems safe to assume that they are more than
passingly related to what a majority of keen observers might agree
program quality is all about.
COMPARISON WITH RESULTS OF THE ROOSE-ANDERSEN STUDY
An analysis of the response to the committee's survey would not be
complete without comparing the results with those obtained in the
survey by Roose and Andersen 12 years earlier. Although there are
obvious similarities in the two surveys, there are also some important
differences that should be kept in mind in examining individual program
ratings of the scholarly quality of faculty. Already mentioned in
this chapter is the inclusion, on the form sent to 90 percent of the
sample members in the committee's survey, of the names and academic
ranks of faculty and the numbers of doctoral graduates in the previous
OCR for page 186
186
.
five years. Other significant changes in the committee's form are the
identification of the university department or academic unit in which
each program may be found, the restriction of requesting evaluators to
make judgments about no more than 50 research-doctorate programs in
their discipline, and the presentation of these programs in random
sequence on each survey form. The sampling frames used in the two
surveys also differ. The sample selected in the earlier study
included only individuals who had been nominated by the participating
universities, while more than one-fourth of the sample in the
committee's survey were chosen at random from full faculty lists.
(Except for this difference the samples were quite similar--i.e., in
terms of number of evaluators in each discipline and the fraction of
senior scholars.~5)
Several dissimilarities in the coverage of the Roose-Andersen and
this committee's reputational assessments should be mentioned. The
former included a total of 130 institutions that had awarded at least
100 doctoral degrees in two or more disciplines during the FY1958-67
period. The institutional coverage in the committee's assessment was
based on the number of doctorates awarded in each discipline (as
described in Chapter I) and covered a total population of 228
universities. Most of the universities represented in the present
study but not the earlier one are institutions that offered
research-doctorate programs in a limited set of disciplines. In the
Roose-Andersen study, programs in five mathematical and physical
science disciplines were rated: astronomy, chemistry, geology,
mathematics, and physics. In the committee's assessment, two
disciplines were added to this list--computer sciences and
statistics/biostatistics--and programs in astronomy were not evaluated
(for reasons explained in Chapter I). Finally, in the Roose-Andersen
study only one set of ratings was compiled from each institution
represented in a discipline, whereas in the committee's survey,
separate ratings were requested if a university offered more than one
research-doctorate program in a given discipline. The consequences of
these differences in survey coverage are quite apparent: in the
committee's survey, evaluations were requested for a total of 593
research-doctorate programs in the mathematical and physical sciences,
compared with 444 programs in the Roose-Andersen study.
Figures 9.1-9.4 plot the mean ratings of scholarly quality of
faculty in programs included in both surveys; sets of ratings are
graphed for 103 programs in chemistry, 57 in geosciences, 86 in
mathematics, and 90 in physics. Since in the Roose-Andersen study
programs were identified by institution and discipline (but not by
department), the matching of results from this survey with those from
. . _ . . . · . ~
For a description of the sample group used in the earlier study,
see Roose and Andersen, pp. 28-31.
~ 6 It should be emphasized that the committee's assessment of
geoscience programs encompasses--in addition to geology--geochemistry,
geophysics, and other earth sciences.
OCR for page 187
s · o ++
4 . 0++
+
+
+
Measure +
3. 0++
08 +
+
+
+
2.0++
+
*
*
*
* *
*
* *
*
* *
*
* *
* *
*
*
* * *
* * * *
*
* *
* * *
* *
*
187
* *
**
* * *
* * *
* *
*
* * *
*
*
* *
*
* *
* *
* *
*
*
* *
+
+
+
+
+
1 . 0++
+
+
+
+
O.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
1.0 2.0 3.0 4.0 5.0
Roose-Andersen Rating (1970)
FIGURE 9.1 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the
Roose-Andersen study--103 programs in chemistry.
OCR for page 188
188
5.0++
+
4 . 0++
t
Measure +
3. 0++
0 8 +
2 . 0++
.
, 1. 0++
*
*
* *
*
* * *
* **
* *
* :
*
* *
* * * * *
*
*
*
* *
*
* * *
* *
*
*
*
*
*
*
* *
r - .85
0.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + +
1.0 2.0 3.0 4.0 5.0
Roose-Andersen Rating ( 1970)
FIGURE 9.2 Mean rating of ~cholarly quality of faculty (measure 08) versus mean rating of faculty in the
Roo~e-Andersen study--57 programs in geosciences.
OCR for page 189
189
5. 0++
4 . 0++
Measure +
3.0++
08 +
2 . 0
1.0
*
*
*
* *
*
*
*
* * *
* * * *
* * *
* *
* * *
*
*
*
*
*
*
*
* *
*
* *
* *
*
*
*
*
**
* *
* * * *
*
*:*
*
* * *
* * * *
*
*
*
r - .94
+
0.0 + + + + + + + + + + + + + + + + +
1.0 2.0 3.0 4.0 S.O
Roose-Andersen Rating ( 1970)
F$GURE 9.3 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the
Roose-Andersen study--86 programs in mathematics.
OCR for page 190
190
5 . 0+
+
4 . 0+
Measure +
3.0++
08 +
2 . 0 _
1.0
*
* *
* *
*
* *
* *
* * *
* * *
* *
*
* * *
* * * * *
*
*
*
*
* *
*
*
*
**
* * * *
* *
* * * *
* * * *
* * * *
* *
* *
* *
*
*
r - .96
O.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
1.0 2.0 3.0 4.0 5.0
Roos - Andersen Rating (1970)
FIGURE 9.4 Mean rating of acholarly quality of faculty (measure 08) versus mean rating of faculty in the
Roos - Andersen study--90 programs in physics.
OCR for page 191
191
the committee's survey is not precise. For universities represented
in the latter survey by more than one program in a particular
discipline, the mean rating for the program with the largest number of
graduates (measure 02) is the only one plotted here. Although the
results of both surveys are reported on identical scales, some caution
must be taken in interpreting differences in the mean ratings a
program received in the two evaluations. It is impossible to estimate
what effect all of the differences described above may have had on the
results of the two surveys. Furthermore, one must remember that the
reported scores are based on the opinions of different groups of
faculty members and were provided at different time periods. In 1969,
when the Roose-Andersen survey was conducted, graduate departments in
most universities were still expanding and not facing the enrollment
and budget reductions that many departments have had to deal with in
recent years. Consequently, a comparison of the overall findings from
the two surveys reveals nothing about how much the quality of graduate
education has improved (or declined) in the past decade. Nor should
the reader place much stock in any small differences in the mean
ratings that a particular program may have received in the two
surveys. On the other hand, it is of particular interest to note the
high correlations between the results of the evaluations. For
programs in chemistry, mathematics, and physics the correlation
coefficients range between .93 and .96; in the geosciences the
coefficient is .85. The lower coefficient in geosciences may be
explained, in part, by the difference, described in footnote 16, in
the field coverage of the two surveys. m e extraordinarily high
correlations found in chemistry, mathematics, and physics may suggest
to some readers that reputational standings of programs in these
disciplines have changed very little in the last decade. However,
differences are apparent for some institutions. Also, one must keep
in mind that the correlations are based on the reputational ratings of
only three-fourths of the programs evaluated in this assessment in
these disciplines and do not take into account the emergence of many
new programs that did not exist or were too small to be rated in the
Roose-Andersen study.
FUTURE STUDIES
One of the most important objectives in undertaking this
assessment was to test new measures not used extensively in past
evaluations of graduate programs. Although the committee believes
that it has been successful in this effort, much more needs to be
done. First and foremost, studies of this kind should be extended to
cover other types of programs and other disciplines not included in
this effort. As a consequence of budgeting limitations, the committee
had to restrict its study to 32 disciplines, selected on the basis of
the number of doctorates awarded in each. Among those omitted were
programs in astronomy, which was included in the Roose-Andersen study;
a multidimensional assessment of research-doctorate programs in this
and many other important disciplines would be of value. Consideration
should also be given to embarking on evaluations of programs offering
other types of graduate and professional degrees. As a matter of
OCR for page 192
192
fact, plans for including master 's-degree programs in this assessment
were originally contemplated, but because of a lack of available
information about the resources and graduates of programs at the
master's level, it was decided to focus on programs leading to the
research doctorate.
Perhaps the most debated issue the committee has had to address
concerned which measures should be reported in this assessment. In
fact, there is still disagreement among some of its members about the
relative merits of certain measures, and the committee fully
recognizes a need for more reliable and valid indices of the quality
of graduate programs. First on a list of needs is more precise and
meaningful information about the product of research-doctorate
programs--the graduates. For example, what fraction of the program
graduates have gone on to be productive investigators--either in the
academic setting or in government and industrial laboratories? What
fraction have gone on to become outstanding investigators--as measured
by receipt of major prizes, membership in academies, and other such
distinctions? How do program graduates compare with regard to their
publication records? Also desired might be measures of the quality of
the students applying for admittance to a graduate program (e.g.,
Graduate Record Examination scores, undergraduate grade point
averages). If reliable data of this sort were made available, they
might provide a useful index of the reputational standings of programs,
from the perspective of graduate students.
A number of alternative measures relevant to the quality of
program faculty were considered by the committee but not included in
the assessment because of the associated difficulties and costs of
compiling the necessary data. For example, what fraction of the
program faculty were invited to present papers at national meetings?
What fraction had been elected to prestigious organizations/groups in
their field? What fraction had received senior fellowships and other
awards of distinction? In addition, it would be highly desirable to
supplement the data presented on NSF, NIH, and ADAMHA research grant
awards (measure 13) with data on awards from other federal agencies
(e.g., Department of Defense, Department of Energy, National
Aeronautics and Space Administration) as well as from major private
foundations.
As described in the preceding pages, the committee was able to
make several changes in the survey design and procedures, but further
improvements could be made. Of highest priority in this regard is the
expansion of the survey sample to include evaluators from outside the
academic setting (in particular, those in government and industrial
laboratories who regularly employ graduates of the programs to be
evaluated). To add evaluators from these sectors would require a
major effort in identifying the survey population from which a sample
could be selected. Although such an effort is likely to involve
considerable costs in both time and financial resources, the committee
believes that the addition of evaluators from the government and
industrial settings would be of value in providing a different
perspective to the reputational assessment and that comparisons
between the ratings supplied by academic and nonacademic evaluators
would be of particular interest.
Representative terms from entire chapter:
scholarly quality