| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 1
Origins of Study
and Selection of Programs
Each year more than 22,000 candidates are awarded doctorates in
engineering, the humanities, and the sciences from approximately 250
U.S. universities. They have spent, on the average, five-and-a-half
years in intensive education in preparation for research careers
either in universities or in settings outside the academic sector, and
many will make significant contributions to research. Yet we are
poorly informed concerning the quality of the programs producing these
graduates. This study is intended to provide information pertinent to
this complex and controversial subject.
The charge to the study committee directed it to build upon the
planning that preceded it. The planning stages included a detailed
review of the methodologies and the results of past studies that had
focused on the assessment of doctoral-level programs. The committee
has taken into consideration the reactions of various groups and
individuals to those studies. The present assessment draws upon
previous experience with program evaluation, with the aim of improving
what was useful and avoiding some of the difficulties encountered in
past studies. The present study, nevertheless, is not purely
reactive: it has its own distinctive features. First, it focuses
only on programs awarding research doctorates and their effectiveness
in preparing students for careers in research. Although other
purposes of graduate education are acknowledged to be important, they
are outside the scope of this assessment. Second, the study examines
a variety of different indices that may be relevant to the program
quality. This multidimensional approach represents an explicit
recognition of the limitations of studies that rely entirely on peer
ratings of perceived quality--the so-called reputational ratings.
Finally, in the compilation of reputational ratings in this study,
evaluators were provided the names of faculty members involved with
each program to be rated and the number of research doctorates awarded
in the last five years. In previous reputational studies evaluators
were not supplied such information.
During the past two decades increasing attention has been given to
describing and measuring the quality of programs in graduate
education. It is evident that the assessment of graduate programs is
highly important for university administrators and faculty, for
employers in industrial and government laboratories, for graduate
1
OCR for page 2
2
students and prospective graduate students, for policymakers in state
and national organizations, and for private and public funding
agencies. Past experience, however, has demonstrated the difficulties
with such assessments and their potentially controversial nature. As
one critic has asserted:
. . . the overall effect of these reports seems quite
clear. They tend, first, to make the rich richer and
the poor poorer; second, the example of the highly
ranked clearly imposes constraints on those institu-
tions lower down the scale (the "Hertz-Avis" effect).
And the effect of such constraints is to reduce
diversity, to reward conformity or respectability, to
penalize genuine experiment or risk. There is, also,
I believe, an obvious tendency to promote the
prevalence of disciplinary dogma and orthodoxy. All
of this might be tolerable if the reports were
tolerably accurate and judicious, if they were less
prescriptive and more descriptive; if they did not
pretend to "objectivity" and if the very fact of
ranking were not pernicious and invidious; if they
genuinely promoted a meaningful "meritocracy~(instead
of simply perpetuating the status quo ante and an
establishment mentality). But this is precisely what
they cannot claim to be or dog
The widespread criticisms of ratings in graduate education were
carefully considered in the planning of this study. At the outset
consideration was given to whether a national assessment of graduate
programs should be undertaken at this time and, if so, what methods
should be employed. The next two sections in this chapter examine the
background and rationale for the decision by the Conference Board of
Associated Research Councils2 to embark on such a study. The
remainder of the chapter describes the selection of disciplines and
programs to be covered in the assessment.
The overall study encompasses a total of 2,699 graduate programs
in 32 disciplines. In this report--the first of five reports issuing
from the study--we examine 596 programs in six disciplines in the
mathematical and physical sciences: chemistry, computer sciences,
geosciences, mathematics, physics, and statistics/biostatistics.
These programs account for more than 90 percent of the research
iWilliam A. Arrowsmith, "Preface" in The Ranking Game: The Power of
the Academic Elite, by W. Patrick Dolan, University of Nebraska
Printing and Duplicating Service, Lincoln, Nebraska, 1976, p. ix.
2 The Conference Board includes representatives of the American
Council of Learned Societies, American Council on Education, National
Research Council, and Social Science Research Council.
OCR for page 3
3
doctorates awarded in these six disciplines. It should be emphasized
that the selection of disciplines to be covered was determined on the
basis of total doctoral awards during the FY1976-78 period (as
described later in this chapter), and the exclusion of a particular
discipline was in no way based on a judgment of the importance of
graduate education or research in that discipline. Also, although the
assessment is limited to programs leading to the research-doctorate
(Ph.D. or equivalent) degree, the Conference Board and study committee
recognize that graduate schools provide many other forms of valuable
and needed education.
PRIOR ATTEMPTS TO ASSESS QUALITY IN GRADUATE EDUCATION
Universities and affiliated organizations have taken the lead in
the review of programs in graduate education. At most institutions
program reviews are carried out on a regular basis and include a
comprehensive examination of the curriculum and educational resources
as well as the qualifications of faculty and students. One special
form of evaluation is that associated with institutional accreditation:
The process begins with the institutional or
programmatic self-study, a comprehensive effort to
measure progress according to previously accepted
objectives. The self-study considers the interest of
a broad cross-section of constituencies--students,
faculty, administrators, alumni, trustees, and in some
circumstances the local community. The resulting
report is reviewed by the appropriate accrediting
commission and serves as the basis for evaluation by a
site-visit team from the accrediting group. . . .
Public as well as educational needs must be served
simultaneously in determining and fostering standards
of quality and integrity in the institutions and such
specialized programs as they offer. Accreditation,
conducted through non-governmental institutional and
specialized agencies, provides a major means for
meeting those needs.3
Although formal accreditation procedures play an important role in
higher education, many university administrators do not view such
procedures as an adequate means of assessing program quality. Other
efforts are being made by universities to evaluate their programs in
graduate education. The Educational Testing Service, with the
sponsorship of the Council of Graduate Schools in the United States
and the Graduate Record Examinations Board, has recently developed a
3Council on Postsecondary Accreditation, The Balance Wheel for
Accreditation, Washington, D.C., July 1981, pp. 2-3.
OCR for page 4
4
set of procedures to assist institutions in evaluating their own
graduate programs.4
While reviews at the institutional (or state) level have proven
useful in assessing the relative strengths and weaknesses of
individual programs, they have not provided the information required
for making national comparisons of graduate programs. Several
attempts have been made at such comparisons. The most widely used of
these have been the studies by Keniston (1959), Cartter (1966), and
Roose and Andersen (1970~. All three studies covered a broad range of
disciplines in engineering, the humanities, and the sciences and were
based on the opinions of knowledgeable individuals in the program
areas covered. Kenistons surveyed the department chairmen at 25
leading institutions. The Cartter6 and Roose-Andersen7 studies
compiled ratings from much larger groups of faculty peers. The stated
motivation for these studies was to increase knowledge concerning the
quality of graduate education:
A number of reasons can be advanced for undertaking
such a study. The diversity of the American system of
higher education has properly been regarded by both
the professional educator and the layman as a great
source of strength, since it permits flexibility and
adaptability and encourages experimentation and
competing solutions to common problems. Yet diversity
also poses problems. . . . Diversity can be a costly
luxury if it is accompanied by ignorance. . . . Just
as consumer knowledge and honest advertising are
requisite if a competitive economy is to work satis-
factorily, so an improved knowledge of opportunities
and of quality is desirable if a diverse educational
system is to work effectively.8
Although the program ratings from the Cartter and Roose-Andersen
studies are highly correlated, some substantial differences in
successive ratings can be detected for a small number of programs--
suggesting changes in the programs or in the perception of the
programs. For the past decade the Roose-Andersen ratings have
4 For a description of these procedures see M. J. Clark, Graduate
Program Self-Assessment Service: Handbook for Users, Educational
Testing Service, Princeton, New Jersey, 1980.
sH. Keniston, Graduate Study in Research in the Arts and Sciences at
.
the University of Pennsylvania, University of Pennsylvania Press,
Phildelphia, 1959.
6 A. M. Cartter, An Assessment of Quality in Graduate Education,
American Council on Education, Washington, D.C., 1966.
7 K. D. Roose and C. J. Andersen, A Rating of Graduate Programs,
American Council on Education, Washington, D.C., 1970.
~Cartter, p. 3.
OCR for page 5
5
generally been regarded as the best available source of information on
the quality of doctoral programs. Although the ratings are now more
than 10 years out of date and have been criticized on a variety of
grounds, they are still used extensively by individuals within the
academic community and by those in federal and state agencies.
A frequently cited criticism of the Cartter and Roose-Andersen
studies is their exclusive reliance upon reputational measurement.
The ACE rankings are but a small part of all the
evaluative processes, but they are also the most
public, and they are clearly based on the narrow
assumptions and elitist structures that so dominate
the present direction of higher education in the
United States. As long as our most prestigious source
of information about post-secondary education is a
vague popularity contest, the resultant ignorance will
continue to provide a cover for the repetitious aping
of a single model. . . . All the attempts to change
higher education will ultimately be strangled by the
"legitimate" evaluative processes that have already
programmed a single set of responses from the start.9
A number of other criticisms have been leveled at reputational
rankings of graduate programs.~° First, such studies inherently
reflect perceptions that may be several years out of date and do not
take into account recent changes in a program. Second, the ratings
of individual programs are likely to be influenced by the overall
reputation of the university--i.e., an institutional "halo effect."
Also, a disproportionately large fraction of the evaluators are
graduates of and/or faculty members in the largest programs, which
may bias the survey results. Finally, on the basis of such studies
it may not be possible to differentiate among many of the lesser
known programs in which relatively few faculty members have
established national reputations in research.
Despite such criticisms several studies based on methodologies
similar to that employed by Cartter and Roose and Andersen have been
carried out during the past 10 years. Some of these studies
evaluated post-baccalaureate programs in areas not covered in the two
earlier reports--including business, religion, educational
administration, and medicine. Others have focused exclusively on
programs in particular disciplines within the sciences and
humanities. A few attempts have been made to assess graduate
programs in a broad range of disciplines, many of which were covered
in the Roose-Andersen and Cartter ratings, but in the opinion of many
each has serious deficiencies in the methods and procedures
9Dolan, p. 8L.
i°For a discussion of these criticisms, see David S. Webster,
"Methods of Assessing Quality," Change, October 1981, pp. 20-24.
OCR for page 6
6
employed. In addition to such studies, a myriad of articles
have been written on the assessment of graduate programs since
the release of the Roose-Andersen report. With the heightening
interest in these evaluations, many in the academic community
have recognized the need to assess graduate programs, using
other criteria in addition to peer judgment.
Though carefully done and useful in a number of ways,
these ratings {Cartter and Roose-Andersen) have been
criticized for their failure to reflect the complexity
of graduate programs, their tendency to emphasize the
traditional values that are highly related to program
size and wealth, and their lack of timeliness or
currency. Rather than repeat such ratings, many
members of the graduate community have voiced a
preference for developing ways to assess the quality
of graduate programs that would be more comprehensive,
sensitive to the different program purposes, and
appropriate for use at any time by individual
departments or universities.
Several attempts have been made to go beyond the reputational assess-
ment. Clark, Harnett, and Baird, in a pilot study 2 of graduate
programs in chemistry, history, and psychology, identified as many as
30 possible measures significant for assessing the quality of graduate
education. Glowers 3 has ranked engineering schools according to the
total amount of research spending and the number of graduates listed
in Who's Who in Ennineerinq. House and Meager rated economics
departments on the basis of the total number of pages published by
full professors in 45 leading journals in this discipline. Other
ratings based on faculty publication records have been compiled for
graduate programs in a variety of disciplines, including political
science, psychology, and sociology. These and other studies
demonstrate the feasibility of a national assessment of graduate
programs that is founded on more than reputational standing among
faculty peers.
~Clark, p. 1.
I'M. J. Clark, R. T. Harnett, and L. L. Baird, Assessing Dimensions
of Quality in Doctoral Education: A Technical Report of a National
Study in Three Fields, Educational Testing Service, Princeton, New
Jersey, 1976.
~ 3 Donald D. Glower, "A Rational Method for Ranking Engineering
Programs," Engineering Education, May 1980.
McDonald R. House and James H. Yeager, Jr., "The Distribution of
Publication Success Within and Among Top Economics Departments: A
Disaggregate View of Recent Evidence," Economic Inquiry, Vol. 16,
No. 4, October 1978, pp. 593-598.
OCR for page 7
7
DEVELOPMENT OF STUDY PLANS
In September 1976 the Conference Board, with support from the
Carnegie Corporation of New York and the Andrew W. Mellon Foundation,
convened a three-day meeting to consider whether a study of programs
in graduate education should be undertaken. The 40 invited
participants at this meeting included academic administrators, faculty
members, and agency and foundation officials,~5 who represented a
variety of institutions, disciplines, and convictions. In these
discussions there was considerable debate concerning whether the
potential benefits of such a study outweighed the possible mis-
representations of the results. On the one hand, "a substantial
majority of the Conference [participants believed] that the earlier
assessments of graduate education have received wide and important
use: by students and their advisors, by the institutions of higher
education as aids to planning and the allocation of educational
functions, as a check on unwarranted claims of excellence, and in
social science research." 6 On the other hand, the conference
participants recognized that a new study assessing the quality of
graduate education "would be conducted and received in a very
different atmosphere than were the earlier Cartter and Roose-Andersen
reports. . . . Where ratings were previously used in deciding where to
increase funds and how to balance expanding programs, they might now
be used in deciding where to cut off funds and programs."
After an extended debate of these issues, it was the recommenda-
tion of this conference that a study with particular emphasis on the
effectiveness of doctoral programs in educating research personnel be
undertaken. The recommendation was based principally on four
considerations:
(1) the importance of the study results to national
and state bodies,
(2) the desire to stimulate continuing emphasis on
quality in graduate education,
(3) the need for current evaluations that take into
account the many changes that have occurred in
programs since the Roose-Andersen study, and
(4) the value of extending the range of measures used
in evaluative studies of graduate programs.
Although many participants expressed interest in an assessment of
master's degree and professional degree programs, insurmountable
problems prohibited the inclusion of these types of programs in this
study.
Following this meeting a 13-member committee, 7 co-chaired by
tssee Appendix G for a list of the participants in this conference.
t6From a summary of the Woods Hole Conference (see Appendix G).
7 See Appendix H for a list of members of the planning committee.
OCR for page 8
8
Gardner Lindzey and Harriet A. Zuckerman, was formed to develop a
detailed plan for a study limited to research-doctorate programs and
designed to improve upon the methodologies utilized in earlier
studies. In its deliberations the planning committee carefully
considered the criticisms of the Roose-Andersen study and other
national assessments. Particular attention was paid to the feasibility
of compiling a variety of specific measures (e.g., faculty publication
records, quality of students, program resources) that were judged to
be related to the quality of research-doctorate programs. Attention
was also given to making improvements in the survey instrument and
procedures used in the Cartter and Roose-Andersen studies. In
September 1978 the planning group submitted a comprehensive report
describing alternative strategies for an evaluation of the quality and
effectiveness of research-doctorate programs.
The proposed study has its own distinctive features.
It is characterized by a sharp focus and a multi-
dimensional approach. (1) It will focus only on
programs awarding research doctorates; other purposes
of doctoral training are acknowledged to be important,
but they are outside the scope of the work contem-
plated. (2) The multidimensional approach represents
an explicit recognition of the limitations of studies
that make assessments solely in terms of ratings of
perceived quality provided by peers--the so-called
reputational ratings. Consequently, a variety of
quality-related measures will be employed in the
proposed study and will be incorporated in the
presentation of the results of the study.
This report formed the basis for the decision by the Conference Board
to embark on a national assessment of doctorate-level programs in the
sciences, engineering, and the humanities.
In June 1980 an 18-member committee was appointed to oversee the
study. The committee,~9 made up of individuals from a diverse set
of disciplines within the sciences, engineering, and the humanities,
includes seven members who had been involved in the planning phase and
several members who presently serve or have served as graduate deans
at either public or private universities. During the first eight
months the committee met three times to review plans for the study
activities, make decisions on the selection of disciplines and
programs to be covered, and design the survey instruments to be used.
Early in the study an effort was made to solicit the views of
presidents and graduate deans at more than 250 universities. Their
suggestions were most helpful to the committee in drawing up final
t8 National Research Council, A Plan to Study the Quality and Effec-
tiveness of Research-Doctorate Programs, 1978 (unpublished report).
~9See p. iii of this volume for a list of members of the study
committee.
OCR for page 9
9
plans for the assessment. With the assistance of the Council of
Graduate Schools in the United States, the committee and its staff
have tried to keep the graduate deans informed about the progress
being made in this study. The final section of this chapter describes
the procedures followed in determining which research-doctorate
programs were to be included in the assessment.
SELECTION OF DISCIPLINES AND PROGRAMS TO BE EVALUATED
One of the most difficult decisions made by the study committee
was the selection of disciplines to be covered in the assessment.
Early in the planning stage it was recognized that some important
areas of graduate education would have to be left out of the study.
Limited financial resources required that efforts be concentrated on a
total of no more than about 30 disciplines in the biological sciences,
engineering, humanities, mathematical and physical sciences, and
social sciences. At its initial meeting the committee decided that
the selection of disciplines within each of these five areas should be
made primarily on the basis of the total number of doctorates awarded
nationally in recent years.
At the time the study was undertaken, aggregate counts of doctoral
degrees earned during the FY1976-78 period were available from two
independent sources--the Educational Testing Service (ETS) and the
National Research Council (NRC). Table 1.1 presents doctoral awards
data for 10 disciplines within the mathematical and physical sciences.
As alluded to in footnote 1 of the table, discrepancies between the
ETS and NRC counts may be explained, in part, by differences in the
data collection procedures. The ETS counts, derived from information
provided by universities, have been categorized according to the
discipline of the department/academic unit in which the degree was
earned. The NRC counts were tabulated from the survey responses of
FY1976-78 Ph.D. recipients, who had been asked to identify their
fields of specialty. Since separate totals for research doctorates in
astronomy, atmospheric sciences, environmental sciences, and marine
sciences were not available from the ETS manual, the committee made
its selection of six disciplines primarily on the basis of the NRC
data. In the case of computer sciences, some consideration was given
to the fact that the ETS estimate was significantly greater than the
NRC estimate ·2 0
The selection of the research-doctorate programs to be evaluated
in each discipline was made in two stages. Programs meeting any of
the following three criteria were initially nominated for inclusion in
the study:
(1) more than a specified number (see below) of
research doctorates awarded during the
FY1976-78 period,
2 ° See footnote 4 in Table 1.1.
OCR for page 10
10
(2) more than one-third of that specified number
of doctorates awarded in FY1979, or
(3) an average rating of 2.0 or higher in the
Roose-Andersen rating of the scholarly quality
of departmental faculty.
In each discipline the specified number of doctorates required for
inclusion in the study was determined in such a way that the programs
meeting this criterion accounted for at least 90 percent of the
TABLE 1.1 Number of Research Doctorates Awarded in the Mathematical
and Physical Science Disciplines, FY1976-78
Source of Date
ETS
Disciplines Included in the Assessment
Chemistry
Physics2
Mathematics
Geosciences3
Computer Sciences4
Statistics/Biostatisticss
Total
Disciplines Not Included in the Assessment
Astronomy
Marine Sciences
Atmospheric Sciences
Environmental Sciences
Other Physical Sciences
Total
NRC
4,624
3,139
1,985
1,395
728
457
12,328
N/A6
N/A
N/A
N/A
N/A
4,739
3,033
1,848
1,139
456
634
11,849
408
406
246
160
132
1,352
data on FY1976-78 doctoral awards were derived from two independent
sources: Educational Testing Service (ETS), Graduate Programs and
Admissions Manual, 1979-81, and NRC's Survey of Earned Doctorates,
-
1976-78. Differences in field definitions account for discrepancies
between the ETS and NRC data.
2 Data from ETS include doctorates in astronomy and astrophysics.
3 Data from ETS include doctorates in atmospheric sciences and
oceanography.
4 The ETS data may include some individuals from computer science
departments who earned doctorates in the field of electrical
engineering and consequently are not included in the NRC data.
s Date from ETS exclude doctorates in biostatistics.
6 Not available.
OCR for page 11
11
doctorates awarded in that discipline during the FY1976-78 period. In
the mathematical and physical science disciplines, the following
numbers of FY1976-78 doctoral awards were required to satisfy the
first criterion (above):
Chemistry--13 or more doctorates
Computer Sciences--5 or more doctorates
Geosciences--7 or more doctorates
Mathematics--7 or more doctorates
Physics--10 or more doctorates
Statistics/Biostatistics--5 or more doctorates
A list of the nominated programs at each institution was then sent to
a designated individual (usually the graduate dean) who had been
appointed by the university president to serve as study coordinator
for the institution. The coordinator was asked to review the list and
eliminate any programs no longer offering research doctorates or not
belonging in the designated discipline. The coordinator also was
given an opportunity to nominate additional programs that he or she
believed should be included in the study. Coordinators were asked
to restrict their nominations to programs that they considered to be
Of uncommon distinction" and that had awarded no fewer than two
research doctorates during the past two years. In order to be
eligible for inclusion, of course, programs had to belong in one of
the disciplines covered in the study. If the university offered more
than one research-doctorate program in a discipline, the coordinator
was instructed to provide information on each of them so that these
programs could be evaluated separately.
The committee received excellent cooperation from the study co-
ordinators at the universities. Of the 243 institutions that were
identified as having one or more research-doctorate programs
satisfying the criteria (listed earlier) for inclusion in the study,
only 7 declined to participate in the study and another 8 failed to
provide the program information requested within the three-month
period allotted (despite several reminders). None of these 15
institutions had doctoral programs that had received strong or
distinguished reputational ratings in prior national studies. Since
the information requested had not been provided, the committee decided
not to include programs from these institutions in any aspect of the
assessment. In each of the six chapters that follows, a list is given
of the universities that met the criteria for inclusion in a
particular discipline but that are not represented in the study.
As a result of nominations by institutional coordinators, some
programs were added to the original list and others dropped. Table
1.2 reports the final coverage in each of the six mathematical and
physical science disciplines. The number of programs evaluated varies
2 ~ See Appendix A for the specific instructions given to the
coordinators.
OCR for page 12
12
TABLE 1.2 Number of Programs Evaluated in Each Discipline and the
Total FY1976-80 Doctoral Awards from These Programs
Discipline
ProgramsFY1976-80 Doctorates*
Chemistry 1457,304
Computer Sciences 581,154
Geosciences 911,747
Mathematics 1152,698
Physics 1234,271
Statistics/Biostatistics 64906
TOTAL 59618,080
*The data on doctoral awards were provided by the study coordinator at
each of the universities covered in the assessment.
considerably by discipline. A total of 145 chemistry programs have
been included in the study; in computer sciences and statistics/
biostatistics fewer than half this number have been included. Although
the final determination of whether a program should be included in the
assessment was left in the hands of the institutional coordinator, it
is entirely possible that a few programs meeting the criteria for
inclusion in the assessment were overlooked by the coordinators.
During the course of the study only two such programs in the
mathematical and physical sciences--one in mathematics and one in
biostatistics--have been called to the attention of the committee.
In the chapter that follows, a detailed description is given of
each of the measures used in the evaluation of research-doctorate
programs in the mathematical and physical sciences. The description
includes a discussion of the rationale for using the measure, the
source from which data for that measure were derived, and any known
limitations that would affect the interpretation of the data
reported. The committee wishes to emphasize that there are
limitations associated with each of the measures and that none of the
measures should be regarded as a precise indicator of the quality of a
program in educating scientists for careers in research. The reader
is strongly urged to consider the descriptive material presented in
Chapter II before attempting to interpret the program evaluations
reported in subsequent chapters. In presenting a frank discussion of
any shortcomings of each measure, the committee's intent is to reduce
the possibility of misuse of the results from this assessment of
research-doctorate programs.
Representative terms from entire chapter:
doctorates awarded