Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 15
Methodology
Quality . . . you know what it is, yet you don't know
what it is. But that's self-contradictory. But some
things are better than others, that is, they have more
quality. But when you try to say what the quality is,
apart from the things that have it, it all goes poof'
m ere's nothing to talk about. But if you can't say
what Quality is, how do you know what it is, or how do
you know that it even exists? If no one knows what it
is, then for all practical purposes it doesn't exist
at all. But for all practical purposes it really does
exist. What else are the grades based on? Why else
would people pay fortunes for some things and throw
others in the trash pile? Obviously some things are
better than others . . . but what's the "betternessn?
. . . So round and round you go, spinning mental
wheels and nowhere finding anyplace to get traction.
What the hell is Quality? What is it?
Robert M. Pirsig
Zen and the Art of
Motorcycle Maintenance
Both the planning committee and our own study committee have given
careful consideration to the types of measures to be employed in the
assessment of research-doctorate programs. me committees recog-
nized that any of the measures that might be used is open to criticism
and that no single measure could be expected to provide an entirely
satisfactory index of the quality of graduate education. With respect
to the use of multiple criteria in educational assessment, one critic
has commented:
PA description of the measures considered may be found in the third
chapter of the planning committee's report, along with a discussion of
the relative merits of each measure.
15
OCR for page 16
16
At best each is a partial measure encompassing a frac-
tion of the large concept. On occasion its link to the
real [world] is problematic and tenuous. Moreover,
each measure [may contain] a load of irrelevant super-
fluities, "extra baggage" unrelated to the outcomes
under study. By the use of a number of such measures,
each contributing a different facet of information, we
can limit the effect of irrelevancies and develop a
more rounded and truer picture of program outcomes.2
Although the use of multiple measures alleviates the criticisms
directed at a single dimension or measure, it certainly will not
satisfy those who believe that the quality of graduate programs cannot
be represented by quantitative estimates no matter how many dimensions
they may be intended to represent. Furthermore, the usefulness of the
assessment is dependent on the validity and reliability of the
criteria on which programs are evaluated. The decision concerning
which measures to adopt in the study was made primarily on the basis
of two factors:
(1)
(2)
the extent to which a measure was judged to be
related to the quality of research-doctorate
programs and
the feasibility of compiling reliable data for
making national comparisons of programs in
particular disciplines.
Only measures that were applicable to a majority of the disciplines to
be covered were considered. In reaching a final decision the study
committee found the ETS study,3 in which 27 separate variables were
examined, especially helpful, even though it was recognized that many
of the measures feasible in institutional self-studies would not be
available in a national study. The committee was aided by the many
suggestions received from university administrators and others within
the academic community.
Although the initial design called for an assessment based on
approximately six measures, the committee concluded that it would be
highly desirable to expand this effort. A total of 12 measures (listed
in Table 2.1) have been utilized in the assessment of research-doctor-
ate programs in art history, classics, English language and literature,
French language and literature, German language and literature, lin-
guistics, music, philosophy, and Spanish language and literature. For
seven of the measures data are available describing most, if not all,
2 C. H. Weiss, Evaluation Research: Methods of Assessing Program
Effectiveness, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1972,
pe 56.
3 See Me Jo Clark et ale (1976) for a description of these variables e
OCR for page 17
17
TABLE 2.1 Measures Compiled on Individual Research-Doctorate Programs
Program Sizei
01 Reported number of faculty members in the program, December 1980.
02 Reported number of program graduates in last five years (July 1975
through June 1980~.
03 Reported total number of full-time and part-time graduate students
enrolled in the program who intend to earn doctorates, December
1980.
Characteristics of Graduates2
04 Fraction of FY1975-79 program graduates who had received some na-
tional fellowship or training grant support during their graduate
education.
05 Median number of years from first enrollment in graduate school to
receipt of the doctorate--FY1975-79 program graduates.3
06 Fraction of FY1975-79 program graduates who at the time they com-
pleted requirements for the doctorate reported that they had made
definite commitments for postgraduation employment.
07 Fraction of FY1975-79 program graduates who at the time they com-
pleted requirements for the doctorate reported that they had made
definite commitments for postgraduation employment in Ph.D.-grant-
ing universities.
Reputational Survey Results4
08 Mean rating of the scholarly quality of program faculty.
09 Mean rating of the effectiveness of the program in educating re-
search scholars/scientists.
10 Mean rating of the improvement in program quality in the last five
years.
11 Mean rating of the evaluators' familiarity with the work of the
program's faculty.
University Library Sizes
12 Composite index describing the library size in the university in
which the program is located, 1979-80.
based on information provided to the committee by the participating
universities.
2 Based on data compiled in the NRC's Survey of Earned Doctorates.
3 In reporting standardized scores and correlations with other vari-
ables, a shorter time-to-Ph.D. is assigned a higher score.
4 Based on responses to the committee's survey conducted in April
1981.
SBased on data compiled by the Association of Research Libraries.
1
OCR for page 18
18
of the humanities programs included in the assessment. For five mea-
sures the coverage is less complete but encompasses at least a
majority of the programs in all but two disciplines. The actual
number of programs evaluated on every measure is reported in the
second table in each of the next nine chapters.
m e 12 measures describe a variety of aspects important to the
operation and function of research-doctorate programs--and thus are
relevant to the quality and effectiveness of programs in educating
humanists for careers in research. However, not all of the measures
may be viewed as "global indices of quality. n Some, such as those re-
lating to program size, are best characterized as "program descriptors"
that, although not dimensions of quality per se, are thought to have a
significant influence on the effectiveness of programs. Other mea-
sures, such as those relating to university library size and support
for graduate training, describe some of the resources generally recog-
nized as being important in maintaining a vibrant program in graduate
education. Measures derived from surveys of faculty peers, on the
other hand, have traditionally been regarded as indices of the overall
quality of graduate programs. Yet these too are not true measures of
quality.
We often settle for an easy-to-gather statistic, per-
fectly legitimate for its own limited purposes, and
then forget that we haven't measured what we want to
talk about. Consider, for instance, the reputation
approach of ranking graduate departments: We ask a
sample of physics professors (say) which the best
physics departments are and then tabulate and report
the results. The ~best" departments are those that
our respondents say are the best. Clearly it is useful
to know which are the highly regarded departments in a
given field, but prestige (which is what we are mea-
suring here) isn't exactly the same as ~ualitY.4
To be sure, each of the 12 measures reported in this assessment has
its own set of limitations. In the sections that follow an explanation
is provided of how each measure has been derived and its particular
limitations as a descriptor of research-doctorate programs.
PROGRAM SIZE
Information was collected from the study coordinators at each
university on the names and ranks of program faculty, doctoral student
enrollment, and number of Ph.D. graduates in each of the past five
years (FY1976-80~. Each coordinator was instructed to include on the
4John Shelton Reed, "How Not To Measure What a University Does, The
Chronicle of Higher Education, Vol. 22, No. 12, May 11, 1981, p. 56.
OCR for page 19
19
faculty list those individuals who, as of December 1, 1980, held
academic appointments (typically at the rank of assistant, associate,
and full professor) and who participated significantly in doctoral
education. Emeritus and adjunct members generally were not to be
included. Measure 01 represents the number of faculty identified in a
program. Measure 02 is the reported number of graduates who earned
Ph.D. or equivalent research doctorates in a program during the period
from July 1, 1975, through June 30, 1980. Measure 03 represents the
total number of full-time and part-time students reported to be
enrolled in a program in the fall of 1980, who intended to earn
research doctorates. All three of these measures describe different
aspects of program size. In previous studies program size has been
shown to be highly correlated with the reputational ratings of a
program, and this relationship is examined in detail in this report.
It should be noted that since the information was provided by the
institutions participating in the study, the data may be influenced by
the subjective decisions made by the individuals completing the forms.
For example, some institutional coordinators may be far less restric-
tive than others in deciding who should be included on the list of
program faculty. To minimize variation in interpretation, detailed
instructions were provided to those filling out the forms.5 Measure
03 is of particular concern in this regard since the coordinators at
some institutions may not have known how many of the students currently
enrolled in graduate study intended to earn doctoral degrees.
CHARACTERISTICS OF GRADUATES
One of the most meaningful measures of the success of a research-
doctorate program is the performance of its graduates. How many go on
to lead productive careers in research and/or teaching? Unfortunately,
reliable information on the subsequent employment and career achieve-
ments of the graduates of individual programs is not available. In the
absence of this directly relevant information, the committee has relied
on four indirect measures derived from data compiled in the NRC's Sur-
vey of Earned Doctorates.6 Although each measure has serious limita-
tions (described below), the committee believes it more desirable to
include this information than not to include data about program grad-
uates.
In identifying program graduates who had received their doctorates
in the previous five years (FY1975-79) ,7 the faculty lists furnished
by the study coordinators at universities were compared with the names
of dissertation advisers (available from the NRC survey). The latter
sA copy of the survey form and instructions sent to study coordina-
tors is included in Appendix A.
6A copy of the questionnaire used in this survey is found in Appen-
dix B.
7 Survey data for the FY1980 Ph.D. recipients had not yet been com-
piled at the time this assessment was undertaken.
OCR for page 20
20
source contains records for virtually all individuals who have earned
research doctorates from U.S. universities since 1920. m e institu-
tion, year, and specialty field of Ph.D. recipients were also used in
determining the identity of program graduates. It is estimated that
this matching process provided information on the graduate training and
employment plans of more than 90 percent of the FY1975-79 graduates
from the humanities programs. In the calculation of each of the four
measures derived from the NRC survey, program data are reported only
if the survey information is available on at least 10 graduates.
Consequently, in the disciplines with smaller programs--art history
and classics--only about half the programs are included in these
measures, whereas more than 97 percent of the English programs are
included.
Measure 04 constitutes the fraction of FY1975-79 graduates of a
program who had received at least some national fellowship support,
including federal fellowships and traineeships, Wood row Wilson
fellowships, or fellowships/traineeships from other U.S. national
organizations. One might expect the more selective programs to have a
greater proportion of students with national fellowship support--
especially "portable fellowships. n Although the committee considered
alternative measures of student ability (e.g., Graduate Record
Examination scores, undergraduate grade point averages), reliable
information of this sort was unavailable for a national assessment.
It should be noted that the relevance of the fellowship measure varies
considerably among disciplines. In the biomedical sciences a substan-
tial fraction of the graduate students are supported by training
grants and fellowships; in the humanities the majority are supported
by teaching assistantships and their own resources.
Measure 05 is the median number of years elapsed from the time
program graduates first enrolled in graduate school to the time they
received their doctoral degrees. For purposes of analysis the
committee has adopted the conventional wisdom that the most talented
students are likely to earn their doctoral degrees in the shortest
periods of time--hence, the shorter the median time-to-Ph.D., the
higher the standardized score that is assigned. Although this measure
has frequently been employed in social science research as a proxy for
student ability, one must regard its use here with some skepticism.
It is quite possible that the length of time it takes a student to
complete requirements for a doctorate may be significantly affected by
the explicit or implicit policies of a university or department. For
example, in certain cases a short time-to-Ph.D. may be indicative of
less stringent requirements for the degree. Furthermore, previous
studies have demonstrated that women and members of minority groups,
for reasons having nothing to do with their abilities, are more likely
than male Caucasians to interrupt their graduate education or to be
For a detailed analysis of this subject, see Dorothy M. Gilford and
Joan Snyder, Women and Minority Ph.D.'s in the 1970's: A Data Book,
National Academy
Sciences, Washington,
1977.
OCR for page 21
21
enrolled on a part-time basis. As a consequence, the median
time-to-Ph.D. may be longer for programs with larger fractions of
women and minority students.
Measure 06 represents the fraction of FY1975-79 program graduates
who reported at the time they had completed requirements for the doc-
torate that they had signed contracts or made firm commitments for
postgraduation employment (including postdoctoral appointments as well
as other positions in the academic or nonacademic sectors) and who
r~r^'r;~ Who ?~=m~c! ^F Ohm; ~ rot ;.,^ ~mr~1~_+
=-~~-~~ -God &~ A- -~_^ ~~- ~ vie =~.V~=e Although this
measure is likely to vary discipline by discipline according to the
availability of employment opportunities, a program's standing relative
to other programs in the same discipline should not be affected by this
variation. In theory, the graduates with the greatest promise should
also
have the easiest time in finding jobs. However, the measure is
influenced by a variety of other factors, such as personal job preter-
ences and restrictions in geographic mobility, that are unrelated to
the ability of the individual. It also should be noted parenthetically
that unemployment rates for doctoral recipients are quite low and that
nearly all of the graduates seeking jobs find positions soon after
completing their doctoral programs.9 Furthermore, first employment
after graduation is by no means a measure of career achievement, which
is what one would like to have if reliable data were available.
Measure 07, a variant of measure 06, constitutes the fraction of
FY1975-79 program graduates who indicated that they had made firm
commitments for employment in Ph.D.-granting universities and who
provided the names of their prospective employers. This measure may
be presumed to be an indication of the fraction of graduates likely to
pursue careers in academic research, although there is no evidence
concerning how many of them remain in academic research in the long
term. In many humanities disciplines the path from Ph.D. to junior
faculty has traditionally been regarded as the road of success for the
growth and development of research talent. The committee is well
aware, of course, that in recent years increasing numbers of graduates
are entering the nonacademic sectors but has relied on a measure that
reflects only the academic side. In the engineering and physical
science disciplines, this limitation is of greater concern than it is
in the humanities disciplines--in which only about 1 of every 10
graduates with definite employment plans intends to take a job outside
the academic environs (see Table 2.2~.
The inclusion of measures 06 and 07 in this assessment has been an
issue much debated by members of the committee; the strenuous objec-
tions by three committee members regarding the use of these measures
are expressed in the Minority Statement, which follows Chapter XII.
9For new Ph.D. recipients in science and engineering, the unemploy-
ment rate has been less than 2 percent (see National Research Council,
Postdoctoral Appointments and Disappointments, National Academy Press,
Washington, D.C., 1981, p. 313~.
OCR for page 22
22
TABLE 2.2 Percentage of FY1975-79 Doctoral Recipients with Definite
Commitments for Employment Outside the Academic Sector*
Art History 13
Classics 10
English Language & Literature 11
French Language & Literature 13
German Language & Literature 13
Linguistics 18
Music 10
Philosophy 8
Spanish Language & Literature 7
*Percentages are based on respondents to the NRC's Survey of
Earned Doctorates who indicated that they had made firm commit-
ments for postgraduation employment and who provided the names
of their Prospective employers. -
rnese percentages may be
considered to be lower-bound estimates of the actual percentages
of doctoral recipients employed outside the academic sector.
REPUTATIONAL SURVEY RESULTS
In April 1981, survey forms were mailed to a total of 1,689 faculty
members in art history, classics, English language and literature,
French language and literature, German language and literature, lin-
guistics, music, philosophy, and Spanish language and literature. The
evaluators were selected from the faculty lists furnished by the study
coordinators at the 228 universities covered in the assessment. These
evaluators constituted approximately 20 percent of the total faculty
population--8,593 faculty members--in the humanities programs being
evaluated (see Table 2.3~. m e survey sample was chosen on the basis
of the number of faculty in a particular program and the number of
doctorates awarded in the previous five years (FY1976-80~--with the
stipulation that at least one evaluator was selected from every program
In selecting the sample each faculty rank
was represented in proportion to the total number of individuals hold-
ing that rank, and preference was given to those faculty members whom
the study coordinators had nominated to serve as evaluators. As shown
in Table 2.3, 1,385 individuals, 82 percent of the survey sample in the
humanities, had been recommended by study coordinators.~°
Each evaluator was asked to consider a stratified random sample of
covered in the assessment.
MA detailed analysis of the survey participants in each discipline
is given in subsequent chapters.
OCR for page 23
23
TABLE 2.3 Survey Response by Discipline and Characteristics
of Evaluator
Total
Program
Faculty
N
Discipline of Evaluator
Survey Sample
Total Respondents
N N 96
,
Ar t History 520 150 94 63
Classics 373 150 100 67
English Language & Literature 3,280 318 198 62
French Language & Literature 613 174 110 63
German Language & Literature 445 150 95 63
Linguistics 501 150 10 5 70
Music 1,080 159 69 43
Philosophy 1,087 231 157 68
Spanish Language & Literature 694 207 136 66
Faculty Rank
Professor 4,330 880 582 66
Associate Professor 2,611 522 337 61
Assistant Professor 1,480 240 139 58
Other 172 17 6 35
Evaluator Selection
Nominated by Institution 2,797 1,385 905 65
Other 5,796 304 159 52
Survey Form
With Faculty Names N/A* 1, 518 964 64
Without Names N/A* 171 100 58
Total All Fields 8,593 1,689 1,064 63
*Not applicable e
OCR for page 24
24
no more than 50 research-doctorate programs in his or her discipline--
with programs stratified by the number of faculty members associated
with each program. Every program was included on 150 survey forms.
The set of programs to be evaluated appeared on each survey form in
random sequence, preceded by an alphabetized list of all programs in
that discipline that were being included in the study. No evaluator
was asked to consider a program at his or her own institution. Ninety
percent of the survey sample group were provided the names of faculty
members in each of the programs to be evaluated, along with data on
the total number of doctorates awarded in the last five years.
The inclusion of this information represents a significant departure
from the procedures used in earlier reputational assessments. For
purposes of comparison with previous studies, 10 percent (randomly
selected in each discipline) were not furnished any information other
than the names of the programs.
The survey items were adapted from the form used in the Roose-
Andersen study. Prior to mailing, the instrument was pretested using
a small sample of faculty members in chemistry and psychology. As a
result, two significant improvements were made in the original survey
design. A question was added on the extent to which the evaluator was
familiar with the work of the faculty in each program. Responses to
this question, reported as measure 11, provide some insight into the
relationship between faculty recognition and the reputational standing
of a program. Also added was a question on the evaluator's field
of specialization--thereby making it possible to compare program evalu-
ations in different specialty areas within a particular discipline.
A total of 1,064 faculty members in the humanities--63 percent of
those asked to participate--completed and returned survey forms (see
Table 2.3~. Two factors probably have contributed to this response
rate being approximately 12 percentage points below the rates reported
in the Cartter and Roose-Andersen studies. 3 First, because of the
considerable expense of printing individualized survey forms (each
25-30 pages), second copies were not sent to sample members not
responding to the first mailing~4--as was done in the Cartter and
Roose-Andersen efforts. Second, it is quite apparent that within the
academic community there has been a growing dissatisfaction in recent
years with educational assessments based on reputational measures.
Indeed, this dissatisfaction was an important factor in the Conference
This information was furnished to the committee by the study coor-
dinators at the universities participating in the study.
Evidence of the strength of the relationship is provided by corre-
lations presented in Chapters ITI-XI, and an analysis of the
relationship is provided in Chapter XII.
~ 3 To compare the response rates obtained in the earlier surveys, see
Roose and Andersen, Table 28, p. 29.
MA follow-up letter was sent to those not responding to the first
mailing and a second copy was distributed to those few evaluators who
specifically requested another form.
OCR for page 25
25
Board's decision to undertake a multidimensional assessment, and some
faculty members included in the sample made known to the committee
their strong objections to the reputational survey.
As can be seen in Table 2.3, there is some variation in the re-
sponse rates in the nine humanities disciplines. Of particular inter-
est is the relatively high rate of response from linguists and the low
rate from those in music--the latter is undoubtedly related to the
difficulties encountered in identifying research-doctorate programs in
music and in compiling comparable lists of faculty members involved in
these programs. It is not surprising to find that the evaluators nom-
inated by study coordinators responded more often than did those who
had been selected at random. Also, those furnished the lists of pro-
gram faculty and numbers of recent graduates completed the survey more
often than did evaluators who were given the abbreviated form. Only
small differences were found among the response rates of assistant,
associate, and full professors.
Each program was considered by an average of approximately 90
survey respondents from other programs in the same discipline. m e
evaluators were asked to judge programs in terms of scholarly quality
of program faculty, effectiveness of the program in educating research
scholars/scientists, and change in program quality in the last five
years.~5 m e mean ratings of a program on these three survey items
constitute measures 08, 09, and 10. Evaluators were also asked to
indicate the extent to which they were familiar with the work of the
program faculty. The average of responses to this item constitutes
measure 11.
In making judgments about the quality of faculty, evaluators were
instructed to consider the scholarly competence and achievements of
the individuals. The ratings were furnished on the following scale:
5 Distinguished
4 Strong
3 Good
2 Adequate
1 Marginal
0 Not sufficient for doctoral education
X Don't know well enough to evaluate
In assessing the effectiveness of a program, evaluators were asked to
consider the accessibility of faculty, the curricula, the instructional
and research facilities, the quality of the graduate students, the per-
formance of graduates, and other factors that contribute to a program's
effectiveness. This measure was rated accordingly:
3 Extremely effective
2 Reasonably effective
USA copy of the survey instrument and accompanying instructions are
included in Appendix C.
OCR for page 26
26
1 Minimally effective
0 Not effective
X Don't know well enough to evaluate
Evaluators were instructed to assess change in program quality on the
basis of whether there has been improvement in the last five years in
both the scholarly quality of faculty and the effectiveness in educat-
-
ing research scholars/scientists. me following alternatives were
provided:
2 Better than five years ago
1 Little or no change in last five years
0 Poorer than five years ago
X Don't know well enough to evaluate
Evaluators were asked to indicate their familiarity with the work of
the program faculty according to the following scale:
2 Considerable familiarity
1 Some familiarity
~ Little or no familiarity
In the computation of mean ratings on measures 08, 09, and 10, the
''don't know" responses were ignored. An average program rating based
on fewer than 15 responses (excluding "don't know") is not reported.
Measures 08, 09, and 10 are subject to many of the same criticisms
that have been directed at previous reputational surveys. Although
care has been taken to improve the sampling design and to provide eval-
uators with some essential information about each program, the survey
results merely reflect a consensus of faculty opinions. As discussed
in Chapter I, these opinions may well be based on out-of-date informa-
tion or be influenced by a variety of factors unrelated to the quality
of the program. In Chapter XII a number of factors that may possibly
affect the survey results are examined. In addition to these limita-
tions, it should be pointed out that evaluators, on the average, were
unfamiliar with almost one-fifth of the programs they were asked to
consider. 6 AS might be expected, the smaller and less prestigious
programs were not as well known, and for this reason one might have
less confidence in the average ratings of these programs. For all four
survey measures standard errors of the mean ratings are reported; they
tend to be larger for the lesser known programs. me frequency of
response to each of the survey items is discussed in Chapter XII.
One additional comment should be made regarding the survey activ-
ity. It should be emphasized that the ratings derived from the survey
relent a program's standing relative to other programs in the same dis-
~ 6 See Table 12.4 in Chapter XII.
OCR for page 27
27
cipline and provide no basis for making cross-disciplinary comparisons.
For example, the fact that a much larger number of English programs
received "distinguished" ratings on measure 08 than did classics pro-
grams indicates nothing about the relative quality of faculty in these
two disciplines. It may depend, in part, on the total numbers of pro-
grams evaluated in these disciplines; in the survey instructions it was
suggested to evaluators that no more than 10 percent of the programs
listed be designated as "distinguished." Nor is it advisable to com-
pare the ratings of a program in one discipline with that of a program
in another discipline because the ratings are based on the opinions of
different groups of evaluators who were asked to judge entirely dif-
ferent sets of programs.
UNIVERSITY LIBRARY SIZE
University library holdings are generally regarded as an important
resource for students in graduate (and undergraduate) education. m e
Association of Research Libraries (ARL) has compiled data from its
academic member institutions and developed a composite measure of a
university library's size relative to those of other ARL members. The
ARL Library Index, as it is called, is based on 10 characteristics:
volumes held, volumes added (gross), microform units held, current se-
rials received, expenditures for library materials, expenditures for
binding, total salary and wage expenditures, other operating expendi-
tures, number of professional staff, and number of nonprofessional
staff. 7 The 1979-80 index, which constitutes measure 12, is avail-
able for 89 of the 228 universities included in the assessment (m ese
89 tend to be among the largest institutions.) m e limited coverage
of this measure is a major shortcoming. It should be noted that the
ARL index is a composite description of library size and not a quali-
tative evaluation of the collections, services, or operations of the
library. Also, it is a measure of aggregate size and does not take
into account the library holdings in a particular department or disci-
pline. Finally, although universities with more than one campus were
instructed to include figures for the main campus only, some in fact
may have reported library size for the entire university system.
Whether this misreporting occurred is not known.
MEASURES OF RESEARCH SUPPORT AND PUBLICATION RECORDS
The committee's other four reports dealing with research-doctorate
programs in the biological sciences, engineering, mathematical and
physical sciences, and social sciences all present two additional
measures pertaining to research support in individual programs and two
measures pertaining to the publication records of program faculty and
other staff. Comparable information for humanities programs are
~7See Appendix D for a description of the calculation of this index.
OCR for page 28
28
either unavailable or, in the committee's judgment, not relevant to an
assessment of humanities programs, and consequently such information
is not presented in this report. For example, data on the fraction of
program faculty holding research grants from the National Science
Foundation, National Institutes of Health, and Alcohol, Drug Abuse,
and Mental Health Administration would not be meaningful in the
humanities disciplines since very few faculty members receive support
from any of these three sources (it was not feasible to compile
information on research grant awards by other federal agencies). Data
compiled by the National Science Foundation on total university
expenditures for research and development in particular disciplines
are not collected for any of the nine humanities disciplines. Finally,
although counts could have been obtained on the numbers of recent
articles authored by program faculty members in the humanities, the
committee believes that such information would be misleading since it
would not include the books or chapters of books authored by these
faculty members. In the humanities disciplines books represent a
major part of the publication effort, but reliable information on the
authorship of books is not readily available.
ANALYSIS AND PRESENTATION OF THE DATA
The next nine chapters present all of the information that has been
compiled on individual research-doctorate programs in art history,
classics, English language and literature, French language and litera-
ture, German language and literature, linguistics, music, philosophy,
and Spanish language and literature. Each chapter follows a similar
format, designed to assist the reader in the interpretation of program
data. me first table in a chapter provides a list of the programs
evaluated in a discipline--including the names of the universities and
departments or academic units in which programs reside--along with the
full set of data compiled for individual programs. Programs are listed
alphabetically according to name of institution, and both raw and
standardized values are given for all measures. For the reader's con-
venience an insert of information from Table 2.1 is provided which
identifies each of the 12 measures reported in the table and indicates
the raw scale used in reporting values for a particular measure.
Standardized values, converted from raw values to have a mean of 50
and a standard deviation of 10,~8 are computed for every measure so
that comparisons can easily be made of a program's relative standing
on different measures. Thus, a standardized value of 30 corresponds
with a raw value that is two standard deviations below the mean for
that measure, and a standardized value of 70 represents a raw value
1 R ~
VThe conversion was made from the precise raw value rather than
from the rounded value reported for each program. Thus, two programs
may have the same reported raw value for a particular measure but
different standardized values.
OCR for page 29
29
two standard deviations above the mean. While the reporting of values
in standardized form is convenient for comparing a particular program's
standing on different measures, it may be misleading in interpreting
actual differences in the values reported for two or more programs--
especially when the distribution of the measure being examined is
highly skewed. For example, the numbers of FY1976-80 program graduates
(measure 02) from four English programs are reported in Table 5.1 as
follows:
Program Raw Value Standardized Value
A
B
C
D
38
11 39
20 42
30 45
Although programs C and D have many times the number of graduates as
have programs A and B. the differences reported on a standardized
scale appear to be small. Thus, the reader is urged to take note of
the raw values before attempting to interpret differences in the
standardized values given for two or more programs.
The initial table in each chapter also presents estimated standard
errors of mean ratings derived from the four survey items (measures
08-11~. A standard error is an estimated standard deviation of the
sample mean rating and may be used to assess the stability of a mean
rating reported for a particular program.~9 For example, one may
assert {with .95 confidence) that the population mean rating would lie
within two standard errors of the sample mean rating reported in this
assessment.
No attempt has been made to establish a composite ranking of pro-
grams in a discipline. Indeed, the committee is convinced that no
single measure adequately reflects the quality of a research-doctorate
program and wishes to emphasize the importance of viewing individual
programs from the perspective of multiple indices or dimensions.
The second table in each chapter presents summary statistics
(i.e., number of programs evaluated, mean, standard deviation, and
decile values) for each of the program measures.20 m e reader should
find these statistics helpful in interpreting the data reported on in-
~9The standard error estimate has been computed by dividing the
standard deviation of a program's ratings by the square root of the
number of ratings. For a more extensive discussion of this topic the
reader may want to refer to Fred N. Kerlinger, Foundations of Behav-
ioral Research, Holt, Reinhart, and Winston, Inc., New York, 1973,
Chapter 12. Readers should note that the estimate is a measure of the
variation in response and by no means includes all possible sources of
error.
20 Standardized scores have been computed from precise values of the
mean and standard deviation of each measure and not the rounded values
reported in the second table of a chapter.
OCR for page 30
30
dividual programs. Next is a table of the intercorrelations among the
various measures for that discipline. This table should be of parti-
cular interest to those desiring information about the interrelations
of the various measures.
The remainder of each chapter is devoted to an examination of
results from the reputational survey. Included are an analysis of the
characteristics of survey participants and graphical portrayals of the
relationship of mean rating of scholarly quality of faculty (measure
08) with the number of faculty (measure 01) and the relationship of
mean rating of program effectiveness {measure 09) with the number of
graduates (measure 02~. A frequently mentioned criticism of the
Roose-Andersen and Cartter studies is that small but distinguished
programs have been penalized in the reputational ratings because they
are not as highly visible as larger programs of comparable quality.
The comparisons of survey ratings with measures of program size are
presented as the first two figures in each chapter, and provide
evidence about the number of small programs in each discipline that
have received high reputational ratings. Since in each case the
reputational rating is more highly correlated with the square root of
program size than with the size measure itself, measures 01 and 02 are
plotted on a square root scale. To assist the reader in inter-
l
preting results of the survey evaluations, each chapter concludes with
a graphical presentation of the mean rating for every program of the
scholarly quality of faculty (measure 08) and an associated "confi-
dence interval" of 1.5 standard errors. In comparing the mean ratings
of two programs, if their reported confidence intervals of 1.S
standard errors do not overlap, one may safely conclude that the
program ratings are significantly different (at the .05 level of
significance)--i.e., the observed difference in mean ratings is too
large to be plausibly attributable to sampling error.2 2
The final chapter of this report gives an overview of the evalua-
tion process in the nine humanities disciplines and includes a summary
of general findings. Particular attention is given to some of the ex-
traneous factors that may influence program ratings of individual
evaluators and thereby distort the survey results. The chapter con-
cludes with a number of specific suggestions for improving future
assessments of research-doctorate programs.
For a general discussion of transforming variables to achieve
linear fits, see John W. Tukey, Exploring Data Analysis, Addison-
Wesley, Reading, Massachusetts, 1977.
2 2This rule for comparing nonoverlapping intervals is valid as long
as the ratio of the two estimated standard errors does not exceed
2.41. (The exact statistical significance of this criterion then lies
between .050 and .034.) Inspection of the standard errors reported in
each discipline shows that for programs with mean ratings differing by
less than 1.0 (on measure 08), the standard error of one mean very
rarely exceeds twice the standard error of another.
Representative terms from entire chapter:
program faculty