Click for next page ( 2


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
Origins of Study and S e l e c ~ i o n 0 f P r 0 g r a m s Each year more than 22,000 candidates are awarded doctorates in engineering, the humanities, and the sciences from approximately 250 U.S. universities. m ey have spent, on the average, five and a half years in intensive education and research in preparation for careers either in universities or in settings outside the academic sector, and many will make significant contributions to research. Yet we are poorly informed concerning the quality of the programs producing these graduates. This study is intended to provide information pertinent to this complex and controversial subject. The charge to the study committee directed it to build upon the planning that preceded it. m e planning stages included a detailed review of the methodologies and the results of past studies that had focused on the assessment of doctoral-level programs. The committee has taken into consideration the reactions of various groups and individuals to those studies. m e present assessment draws upon previous experience with program evaluation, with the aim of improving what was useful and avoiding some of the difficulties encountered in past studies. m e present study, nevertheless, is not purely reactive: it has its own distinctive features. First, it focuses only on programs awarding research doctorates and their effectiveness in preparing students for careers in research. Although other pur- poses of graduate education are acknowledged to be important, they are outside the scope of this assessment. Second, the study examines a variety of different indices that may be relevant to the program quality. m is multidimensional approach represents an explicit recognition of the limitations of studies that rely entirely on peer ratings of perceived quality--the so-called reputational ratings. Finally, in the compilation of reputational ratings in this study, evaluators were provided the names of faculty members involved with each program to be rated and the number of research doctorates awarded in the last five years. In previous reputational studies evaluators were not supplied such information. During the past two decades increasing attention has been given to describing and measuring the quality of programs in graduate educa- tion. It is evident that the assessment of graduate programs is highly important for university administrators and faculty, for grad- uate students and prospective graduate students, for policymakers in state and national organizations, and for private and public funding 1

OCR for page 1
2 agencies. Past experience, however, has demonstrated the difficulties with such assessments and their potentially controversial nature. As one critic has asserted: . . . the overall effect of these reports seems Quite clear. They tend, first, to make the rich richer and the poor poorer; second, the example of the highly ranked clearly imposes constraints on those institu- tions lower down the scale (the "Hertz-Avis" effect). And the effect of such constraints is to reduce diversity, to reward conformity or respectability, to penalize genuine experiment or risk. mere is, also, I believe, an obvious tendency to promote the prevalence of disciplinary dogma and orthodoxy. All of this might be tolerable if the reports were tolerably accurate and judicious, if they were less prescriptive and more descriptive; if they did not pretend to "objectivity" and if the very fact of ranking were not pernicious and invidious; if they genuinely promoted a meaningful "meritocracy"(instead of simply perpetuating the status quo ante and an establishment mentality). But this is precisely what they cannot claim to be or do. The widespread criticisms of ratings in graduate education were carefully considered in the planning of this study. At the outset consideration was given to whether a national assessment of graduate programs should be undertaken at this time and, if so, what methods should be employed. m e next two sections in this chapter examine the background and rationale for the decision by the Conference Board of Associated Research Councils2 to embark on such a study. The remain- der of the chapter describes the selection of disciplines and programs to be covered in the assessment. m e overall study encompasses a total of 2,699 graduate programs in 32 disciplines. In this report--the second of five reports issuing from the study--we examine 522 programs in nine disciplines in the humanities: art history, classics, English language and literature, French language and literature, German language and literature, lin- guistics, music, philosophy, and Spanish language and literature. These programs account for more than 90 percent of the research doctorates awarded in these nine disciplines. It should be emphasized that the selection of disciplines to be covered was determined primar- ily on the basis of total doctoral awards during the FY1976-78 period William A. Arrowsmith, "Preface" in m e Ranking Game: The Power of the Academic Elite, by W. Patrick Dolan, University of Nebraska Printing and Duplicating Service, Lincoln, Nebraska, 1976, p. ix. 2 The Conference Board includes representatives of the American Council of Learned Societies, American Council on Education, National Research Council, and Social Science Research Council.

OCR for page 1
3 (as described later in this chapter), and the exclusion of a particular discipline was in no way based on a judgment of the importance of graduate education or research in that discipline. Also, although the assessment is limited to programs leading to the research-doctorate (Ph.D. or equivalent) degree, the Conference Board and study committee recognize that graduate schools provide many other forms of valuable and needed education. PRIOR ATTEMPTS TO ASSESS QUALITY IN GRADUATE EDUCATION Universities and affiliated organizations have taken the lead in the review of programs in graduate education. At most institutions program reviews are carried out on a regular basis and include a com- prehensive examination of the curriculum and educational resources as well as the qualifications of faculty and students. One special form of evaluation is that associated with institutional accreditation: m e process begins with the institutional or programmatic self-study, a comprehensive effort to measure progress according to previously accepted objectives. The self-study considers the interest of a broad cross-section of constituencies--students, faculty, administrators, alumni, trustees, and in some circumstances the local community. The resulting report is reviewed by the appropriate accrediting commission and serves as the basis for evaluation by a s~te-visit team from the accrediting group. . . . Public as well as educational needs must be served simultaneously in determining and fostering standards of quality and integrity in the institutions and such specialized programs as they offer. Accreditation, conducted through nongovernmental institutional and specialized agencies, provides a major means for meeting those needs .3 Although formal accreditation plays an important role in higher education, many university administrators do not view such procedures as an adequate means of assessing program quality. Other efforts are being made by universities to evaluate their programs in graduate education. The Educational Testing Service, with the sponsorship of the Council of Graduate Schools in the United States and the Graduate Record Examinations Board, has recently developed a set of procedures to assist institutions in evaluating their own graduate programs .4 3Council on Postsecondary Accreditation, The Balance Wheel for Accreditation. Washinoton. D.C., JU1Y 1981, PP. 2-3. ~ ~ , . ~ For a description of these procedures, see M. J. Clark, Graduate Program Self-Assessment Service: Handbook for Users, Educational Testing Service, Princeton, New Jersey, 1980.

OCR for page 1
4 While reviews at the institutional (or state) level have proven useful in assessing the relative strengths and weaknesses of indi- vidual programs, they have not provided the information required for making national comparisons of graduate programs. Several attempts have been made at such comparisons. The most widely used of these have been the studies by Kenis ton (1959), Cartter (1966), and Roose and Andersen (1970~. All three studies covered a broad range of disciplines in engineering, the humanities, and the sciences and were based on the opinions of knowledgeable individuals in the program areas covered. Keniston5 surveyed the department chairmen at 25 leading institutions. m e Cartter6 and Roose-Andersen7 studies compiled ratings from much larger groups of faculty peers. The stated motivation for these studies was to increase knowledge concerning the quality of graduate education: A number of reasons can be advanced for undertaking such a study. The diversity of the American system of higher education has properly been regarded by both the professional educator and the layman as a great source of strength, since it permits flexibility and adaptability and encourages experimentation and com- peting solutions to common problems. Yet diversity also poses problems. . . . Diversity can be a costly luxury if it is accompanied by ignorance. . . . Just as consumer knowledge and honest advertising are requisite if a competitive economy is to work satis- factorily, so an improved knowledge of opportunities and of quality is desirable if a diverse educational system is to work effectively.8 Although the program ratings from the Cartter and Roose-Andersen studies are highly correlated, some substantial differences in successive ratings can be detected for a small number of programs-- suggesting changes in the programs or in the perception of the programs. For the past decade the Roose-Andersen ratings have generally been regarded as the best available source of information on the quality of doctoral programs. Although the ratings are now more than 10 years out of date and have been criticized on a variety of grounds, they are still used extensively by individuals within the academic community and by those in federal and state agencies. 5 H. Keniston, Graduate Study in Research in the Arts and Sciences at the University of Pennsylvania, University of Pennsylvania Press, Phildelphia, 1959. 6A. M. Cartter, An Assessment of Quality in Graduate Education, American Council on Education, Washington, D.C., 1966. 7K. D. Roose and C. J. Andersen, A Rating of Graduate Programs, American Council on Education, Washington, D.C., 1970. ~Cartter, p. 3.

OCR for page 1
5 A frequently cited criticism of the Cartter and Roose-Andersen studies is their exclusive reliance upon reputational measurement. m e ACE rankings are but a small part of all the evaluative processes, but they are also the most public, and they are clearly based on the narrow assumptions and elitist structures that so dominate the present direction of higher education in the United States. As long as our most prestigious source of information about postsecondary education is a vague popularity contest, the resultant ignorance will continue to provide a cover for the repetitious acing ~ ~ ~ At: ~ At, ~ ~~, V- ~ - ~~ _= 11lV~C_ e e e e All the attempts to change higher education will ultimately be strangled by the "legitimate" evaluative processes that have already programmed a single set of responses from the start.9 A number of other criticisms have been leveled at reputational rankings of graduate programs.~ First' such studies inherently reflect per- ceptions that may be several years out of date and do not take into account recent changes in a program. Second, the ratings of individual programs are likely to be influenced by the overall reputation of the university--i.e., an institutional "halo effect." Also, a dispropor- tionately large fraction of the evaluators are graduates of and/or faculty members in the largest programs, which may bias the survey results. Finally, on the basis of such studies it may not be possible to differentiate among many of the lesser known programs in which relatively few faculty members have established national reputations in research. Despite such criticisms several studies based on methodologies similar to those employed by Cartter and Roose and Andersen have been carried out during the past 10 years. Some of these studies evaluated post-baccalaureate programs in areas not covered in the two earlier reports--including business, religion, educational administration, and medicine. Others have focused exclusively on Programs in particular disciplines within the sciences and humanities. A few attempts have been made to assess graduate programs in a broad range of disciplines, many of which were covered in the Roose-Andersen and Cartter ratings, but in the opinion of many each has serious deficiencies in the methods and procedures employed. In addition to such studies, a myriad of articles have been written on the assessment of graduate programs since the release of the Roose-Andersen report. With the heightening interest in these evaluations, many in the academic com- munity have recognized the need to assess graduate programs, using other criteria in addition to peer judgment. 9Dolan, p. 81. tFor a discussion of these criticisms, see David S. Webster, "Methods of Assessing Quality," Channe, October 1981, pp. 20-24.

OCR for page 1
6 Rough carefully done and useful in a number of ways, these ratings (Cartter and Roose-Andersen) have been criticized for their failure to reflect the complexity of graduate programs, their tendency to emphasize the traditional values that are highly related to program size and wealth, and their lack of timeliness or cur- rency. Rather than repeat such ratings, many members of the graduate community have voiced a preference for developing ways to assess the quality of graduate programs that would be more comprehensive, sensitive to the different program purposes, and appropriate for use at any time by individual departments or universities. Several attempts have been made to go beyond the reputational assess- ment. Clark, Harnett, and Baird, in a pilot study 2 of graduate programs in chemistry, history, and psychology, identified as many as 30 possible measures significant for assessing the quality of graduate education. Glowers 3 has ranked engineering schools according to the total amount of research spending and the number of graduates listed in Who's Who in Engineering. House and Yeager~4 rated economics de- partments on the basis of the total number of pages published by full professors in 45 leading journals in this discipline. Other ratings based on faculty publication records have been compiled for graduate programs in a variety of disciplines, including political science, psychology, and sociology. m ese and other studies demonstrate the feasibility of a national assessment of graduate programs that is founded on more than reputational standing among faculty peers. _ _ DEVELOPMENT OF STUDY PLANS In September 1976 the Conference Board, with support from the Carnegie Corporation of New York and the Andrew W. Mellon Foundation, convened a three-day meeting to consider whether a study of programs in graduate education should be undertaken. m e 40 invited partici- pants in this meeting included academic administrators, faculty mem- ~Clark, p. 1. I'M. J. Clark, R. T. Harnett, and L. L. Baird, Assessing Dimensions of Quality in Doctoral Education: A Technical Report of a National Study in Tree Fields, Educational Testing Service, Princeton, New Jersey, 1976. ~3 Donald D. Glower, "A Rational Method for Ranking Engineering Programs,-- engineering ~aucatzon, May 1980. Donald R. House and James H. Yeager, Jr., " m e Distribution of Publication Success Within and Among Top Economics Departments: A Disaggregate View of Recent Evidence, Economic Inquiry, Vol. 16, No. 4, October 1978, pp. 593-598.

OCR for page 1
7 hers, and agency and foundation officials,iS and represented a vari- ety of institutions, disciplines, and convictions. In these discus- sions there was considerable debate concerning whether the potential benefits of such a study outweighed the possible misrepresentations of the results. On the one hand, Pa substantial majority of the Confer- ence [participants believed] that the earlier assessments of graduate education have received wide and important use: by students and their advisors, by the institutions of higher education as aids to planning and the allocation of educational functions, as a check on unwarranted claims of excellence, and in social science research." 6 On the other hand, the Conference participants recognized that a new study assessing the quality of graduate education "would be conducted and received in a very different atmosphere than were the earlier Cartter and Roose-Andersen reports. . . . Where ratings were previously used in deciding where to increase funds and how to balance expanding pro- grams, they might now be used in deciding where to cut off funds and programs. n After an extended debate of these issues, it was the recommenda- tion of this conference that a study with particular emphasis on the effectiveness of doctoral programs in educating research personnel be undertaken. The recommendation was based principally on four consider- ations: (1) the importance of the study results to national and state bodies, (2) the desire to stimulate continuing emphasis on quality in graduate education, (3) the need for current evaluations that take into account the many changes that have occurred in programs since the Roose-Andersen study, and (4) the value of extending the range of measures used in evaluative studies of graduate programs. Although many participants expressed interest in an assessment of master's degree and professional degree programs, insurmountable problems prohibited the inclusion of these types of programs in this study. Following this meeting a 13-member committee, 7 co-chaired by Gardner Lindzey and Harriet A. Zuckerman, was formed to develop a detailed plan for a study limited to research-doctorate programs and designed to improve upon the methodologies utilized in earlier studies. In its deliberations the planning committee carefully con- sidered the criticisms of the Roose-Andersen study and other national assessments. Particular attention was paid to the feasibility of compiling a variety of specific measures {e.g., faculty publication resee Appendix E for a list of the participants in this conference. from a summary of the Woods Hole Conference (see Appendix G). Resee Appendix H for a list of members of the planning committee.

OCR for page 1
8 records, quality of students, program resources) that were be related to the quality of research-doctorate programs. ~ . . . . . . judged to Attention was auto given co making Improvements In the survey instrument and procedures used in the Cartter and Roose-Andersen studies. In September 1978 the planning group submitted a comprehensive report describing alternative strategies for an evaluation of the quality and effectiveness of research-doctorate programs. The proposed study has its own distinctive features. It is characterized by a sharp focus and a multidimen- sional approach. (1) It will focus only on programs awarding research doctorates; other purposes of doc- toral training are acknowledged to be important, but they are outside the scope of the work contemplated. (2) The multidimensional approach represents an ex- plicit recognition of the limitations of studies that make assessments solely in terms of ratings of per- ceived quality provided by peers--the so-called repu- tational ratings. Consequently, a variety of quality- related measures will be employed in the proposed study and will be incorporated in the presentation of the re- sults of the study. This report formed the basis for the decision by the Conference Board to embark on a national assessment of doctorate-level programs in the sciences, engineering, and the humanities. In June 1980 an 18-member committee was appointed to oversee the study. The committee,~9 made up of individuals from a diverse set of disciplines within the sciences, engineering, and the humanities, includes seven members who had been involved in the planning phase and several members who presently serve or have served as graduate deans in either public or private universities. During the first eight months the committee met three times to review plans for the study ac- tivities, make decisions on the selection of disciplines and programs to be covered, and design the survey instruments to be used. Early in the study an effort was made to solicit the views of presidents and graduate deans at more than 250 universities. Their suggestions were most helpful to the committee in drawing up final plans for the assess- ment. With the assistance of the Council of Graduate Schools in the United States, the committee and its staff have tried to keep the graduate deans informed about the progress being made in this study. The final section of this chapter describes the procedures followed in determining which research-doctorate programs were to be included in the assessment. t8National Research Council, A Plan to Study the Quality and Effec- tiveness of Research-Doctorate Programs, 1978 (unpublished report). - `9See p. iii of this volume for a list of members of the study committee.

OCR for page 1
9 SELECTION OF DISCIPLINES AND PROGRAMS TO BE EVALUATED One of the most difficult decisions made by the study committee was the selection of disciplines to be covered in the assessment. Early in the planning stage it was recognized that some important areas of graduate education would have to be left out of the study. Limited financial resources required that efforts be concentrated on a total of no more than about 30 disciplines in the biological sciences, engineering, humanities, mathematical and physical sciences, and social sciences. At its initial meeting the committee decided that the selection of disciplines within each of these five areas should be made primarily on the basis of the total number of doctorates awarded nationally in recent years. At the time the study was undertaken, aggregate counts of doctoral degrees earned during the FY1976-78 period were available from two independent sources--the Educational Testing Service (ETS) and the National Research Council (NRC). Table 1.1 presents doctoral awards data for 14 disciplines within the humanities. As alluded to in foot- note 1 of the table, discrepancies between the ETS and NRC counts may be explained, in part, by differences in the data collection proce- dures. m e ETS counts, derived from information provided by universi- ties, have been categorized according to the discipline of the depart- ment/academic unit in which the degree was earned. m e NRC counts were tabulated from the survey responses of FY1976-78 Ph.D. recipients, who had been asked to identify their fields of specialty. Initially the committee planned to include no more than five or six humanities disciplines in the assessment. However, because of the large number of disciplines within the humanities and because of the particular interests in this area on the part of a principal sponsor of the study the committee decided to assess programs in as many as nine disci- plines: art history, classics, English language and literature, French language and literature, German language and literature, linguistics, music, philosophy, and Spanish language and literature. In making _ _ ~ _ ~ . . . . . . . . . tuls selection the committee took into account budgetary limitations that prohibited the inclusion of more than nine humanities disciplines and the importance of maintaining continuity with the earlier Roose- Andersen study. Since all nine of the humanities disciplines that were selected had been included in the earlier study as well,20 it is possible to compare results from the two studies for a broad set of humanities programs. Although on the basis of numbers of recent doc- toral awards four additional disciplines--comparative literature, dra- matic and creative arts, religious studies, and speech/rhetoric/debate --might also have been selected, none of these four had been included in the earlier study. 2 The only humanities discipline included in the Roose-Andersen study but excluded in the committee's assessment is Russian language and literature, in which fewer than 200 doctoral degrees were awarded in the FY1976-78 period. t

OCR for page 1
10 TABLE 1.1 Number of Research-Doctorates Awarded in Humanities Disciplines, FY1976-78 Source of Data ETS Disciplines Included in the Assessment English Language & Literature Music Philosophy French Language & Literature Spanish Language & Literature2 Linguistics Art History German Language & Literature Classics Disciplines Not Included in the Assessment Religious Studies3 Speech/Rhetoric/Debate4 Comparative Literature Dramatic & Creative Arts Russian Language & Literature Other Humanities 3,192 1,185 911 504 500 433 477 419 223 704 650 517 356 184 N/A NRC 3,301 1,122 1,006 636 606 516 447 421 206 540 228 422 N/A 166 608 data on FY1976-78 doctoral awards were derived from two independent sources: Educational Testing Service (ETS), Graduate Programs and Admissions Manual, 1979-81, and NRC's Survey of Earned Doctorates, 1976-78. Differences in field definitions account for discrepancies between the ETS and NRC data. 2 Data from ETS include doctorates in Italian languages and literatures. 3 Data from ETS include doctorates in theology as well as those in religion. 4 Date from ETS may include doctorates awarded in hearing sciences; degrees in this field are not included in the NRC data.

OCR for page 1
11 m e selection of research-doctorate programs to be evaluated in each discipline was made in two stages. Programs meeting and of the following criteria were initially nominated for inclusion in the study: (1) more than a specified number (see below) of research doctorates awarded during the FY1976-78 period, (2) more than one-third of that specified number of doctorates awarded in FY1979, or (3) an average rating of 2.0 or higher in the Roose- Andersen rating of the scholarly quality of departmental faculty. In each discipline the specified number of doctorates required for inclusion in the study was determined in such a way that the programs meeting this criterion accounted for at least 90 percent of the doc- torates awarded in that discipline during the FY1976-78 period. In the humanities the following numbers of FY1976-78 doctoral awards were required to satisfy the first criterion (above): Art History--5 or more doctorates Classics--3 or more doctorates English Language & Literature--13 or more doctorates French Language & Literature--5 or more doctorates German Language & Literature--4 or more doctorates Linguistics--5 or more doctorates Music--9 or more doctorates Philosophy--6 or more doctorates Spanish Language & Literature--5 or more doctorates A list of the nominated programs at each institution was then sent to a designated individual (usually the graduate dean) who had been appointed by the university president to serve as study coordinator for the institution. The coordinator was asked to review the list and eliminate any programs no longer offering research doctorates or not belonging in the designated discipline. m e coordinator also was given an opportunity to nominate additional programs that he or she believed should be included in the study. Coordinators were asked to restrict their nominations to programs that they considered to be "of uncommon distinction" and that had awarded no fewer than two research doctorates during the past two years. In order to be eligible for inclusion, of course, programs had to belong in one of the disciplines covered in the study. If the university offered more than one research-doctorate program in a discipline, the coordinator was instructed to provide information on each of them so that these 2tSee Appendix A for the specific instructions given to the coordinators.

OCR for page 1
12 programs could be evaluated separately. As discussed in Chapter IX, particular problems were encountered in identifying research-doctorate programs in music, and the committee has serious reservations concern- ing the comparability of the 53 programs that were evaluated in this discipline. m e committee received excellent cooperation from the study coor- dinators at universities. Of the 243 institutions that were identified as having one or more research-doctorate programs satisfying the cri- teria (listed earlier) for inclusion in the study, only 7 declined to participate in the study and another 8 failed to provide the program information requested within the three-month period allotted (despite several reminders). None of these IS institutions had doctoral pro- grams that had received strong or distinguished reputational ratings in prior national studies. Since the information requested had not been provided, the committee decided not to include programs from these institutions in any aspect of the assessment. In each of the nine chapters that follows, a list is given of the universities that met the criteria for inclusion in a particular discipline but that are not rep- resented in the study. AS a result of nominations by institutional coordinators, some programs were added to the original list and others dropped. Table 1.2 reports the final coverage in each of the nine humanities disciplines. m e number of programs evaluated varies considerably by discipline. A total of 106 English programs have been included in the study; in lin- guistics and classics fewer than one-third this number have been in- TABLE 1.2 Number of Programs Evaluated in Each Discipline and the Total FY1976-80 Doctoral Awards from mese Programs Discipline Art History 41 Classics 35 English Language ~ Literature 106 French Language & Literature 58 German Language & Literature 48 Linguistics 35 Music 53 Philosophy 77 Spanish Language & Literature 69 Programs FY1976-80 Doctorates* 752 334 4,687 811 616 652 1,385 1,395 812 Total 522 11,444 *The data on doctoral awards were provided by the study coordinator at each of the universities covered in the assessment.

OCR for page 1
13 eluded. Although the final determination of whether a program should be included in the assessment was left in the hands of the institution- al coordinator, it is entirely possible that a few programs meeting the criteria for inclusion in the assessment were overlooked by the coordinators. In the chapter that follows, a detailed description is given of each of the measures used in the evaluation of research-doctorate programs in the humanities. The description includes a discussion of the rationale for using the measure, the source from which data for that measure were derived, and any known limitations that would affect the interpretation of the data reported. m e committee wishes to emphasize that there are limitations associated with each of the measures and that none of the measures should be regarded as a precise indicator of the quality of a program in educating humanists for careers in research. me reader is strongly urged to consider the descriptive material presented in Chapter II before attempting to interpret the program evaluations reported in subsequent chapters. In presenting a frank discussion of any shortcomings of each measure, the committee's intent is to reduce the possibility of misuse of the results from this assessment of research-doctorate programs.

OCR for page 1