Click for next page ( 2


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
Origins of Study arid Selection of Programs Each year more than 22,000 candidates are awarded doctorates in engineering, the humanities, and the sciences from approximately 250 U.S. universities. They have spent, on the average, five-and-a-half years in intensive education and research in preparation for careers either in universities or in settings outside the academic sector, and many will make significant contributions to research. Yet we are poorly informed concerning the quality of the programs producing these graduates. This study is intended to provide information pertinent to this complex and controversial subject. The charge to the study committee directed it to build upon the planning that preceded it. The planning stages included a detailed review of the methodologies and the results of past studies that had focused on the assessment of doctoral-level programs. The committee has taken into consideration the reactions of various groups and indi- viduals to those studies. The present assessment draws upon previous experience with program evaluation, with the aim of improving what was useful and avoiding some of the difficulties encountered in past stud- ies. The present study, nevertheless, is not purely reactive: it has its own distinctive features. First, it focuses only on programs awarding research doctorates and their effectiveness in preparing stu- dents for careers in research. Although other purposes of graduate education are acknowledged to be important, they are outside the scope of this assessment. Second, the study examines a variety of different indices that may be relevant to the program quality. This multidimen- sional approach represents an explicit recognition of the limitations of studies that rely entirely on peer ratings of perceived quality--the so-called reputational ratings. Finally, in the compilation of repu- tational ratings in this study, evaluators were provided the names of faculty members involved with each program to be rated and the number of research doctorates awarded in the last five years. In previous reputational studies evaluators were not supplied such information. During the past two decades increasing attention has been given to describing and measuring the quality of programs in graduate education. It is evident that the assessment of graduate programs is highly im- portant for university administrators and faculty, for employers in industrial and government laboratories, for graduate students and prospective graduate students, for policymakers in state and national 1

OCR for page 1
2 organizations, and for private and public funding agencies. Past ex- perience, however, has demonstrated the difficulties with such assess- ments and their potentially controversial nature. As one critic has asserted: . . . the overall effect of these reports seems quite - clear. They tend, first, to make the rich richer and the poor poorer; second, the example of the highly ranked clearly imposes constraints on those institu- tions lower down the scale {the "Hertz-Avis" effect). And the effect of such constraints is to reduce diver- sity, to reward conformity or respectability, to penal- ize genuine experiment or risk. There is, also, I be- lieve, an obvious tendency to promote the prevalence of disciplinary dogma and orthodoxy. All of this might be tolerable if the reports were tolerably accurate and judicious, if they were less prescriptive and more de- scriptive; if they did not pretend to "objectivity" and if the very fact of ranking were not pernicious and in- vidious; if they genuinely promoted a meaningful "meri- tocracy" (instead of simply perpetuating the status quo ante and an establishment mentality). But this is pre- cisely what they cannot claim to be or do.i The widespread criticisms of ratings in graduate education were carefully considered in the planning of this study. At the outset consideration was given to whether a national assessment of graduate programs should be undertaken at this time and, if so, what methods should be employed. The next two sections in this chapter examine the background and rationale for the decision by the Conference Board of Associated Research Councils2 to embark on such a study. The remain- der of the chapter describes the selection of disciplines and programs to be covered in the assessment. The overall study encompasses a total of 2,699 graduate programs in 32 disciplines. In this report--the fifth and final report issuing from the study--we examine 639 programs in seven disciplines in the social and behavioral sciences: anthropology, economics, geography, history, political science, psychology, and sociology. These programs account for more than 90 percent of the research doctorates awarded in these seven disciplines. It should be emphasized that the selection of disciplines to be covered was determined on the basis of total doc- toral awards during the FY1976-78 period (as described later in this William A. Arrowsmith, "Preface" in The Ranking Game: The Power of the Academic Elite, by W. Patrick Dolan, University of Nebraska Print- ing and Duplicating Service, Lincoln, Nebraska, 1976, pe ix. 2The Conference Board includes representatives of the American Coun- cil of Learned Societies, American Council on Education, National Re- search Council, and Social Science Research Council.

OCR for page 1
3 chapter), and the exclusion of a particular discipline was in no way based on a judgment of the importance of graduate education or research in that discipline. Also, although the assessment is limited to pro- grams leading to the research-doctorate {Ph.D. or equivalent) degree, the Conference Board and study committee recognize that graduate schools provide many other forms of valuable and needed education. PRIOR ATTEMPTS TO ASSESS QUALITY IN GRADUATE EDUCATION Universities and affiliated organizations have taken the lead in the review of programs in graduate education. At most institutions program reviews are carried out on a regular basis and include a com- prehensive examination of the curriculum and educational resources as well as the qualifications of faculty and students. One special form of evaluation is that associated with institutional accreditation: The process begins with the institutional or program- matic self-study, a comprehensive effort to measure progress according to previously accepted objectives. The self-study considers the interest of a broad cross- section of constituencies--students, faculty, admini- strators, alumni, trustees, and in some circumstances the local community. The resulting report is reviewed by the appropriate accrediting commission and serves as the basis for evaluation by a site-visit team from the accrediting group. . . . Public as well as educa- tional needs must be served simultaneously in determin- ing and fostering standards of quality and integrity in the institutions and such specialized programs as they offer. Accreditation, conducted through nongov- ernmental institutional and specialized agencies, pro- vides a major means for meeting those needs. 3 Although formal accreditation procedures play an important role in higher education, many university administrators do not view such pro- cedures as an adequate means of assessing program quality. Other ef- forts are being made by universities to evaluate their programs in graduate education. The Educational Testing Service, with the sponsor- ship of the Council of Graduate Schools in the United States and the Graduate Record Examinations Board, has recently developed a set of procedures to assist institutions in evaluating their own graduate programs.4 3 Council on Postsecondary Accreditation, The Balance Wheel for Ac- creditation, Washington, D.C., July 1981, pp. 2-3. 4For a description of these procedures, see M. J. Clark, Graduate Program Self-Assessment Service: Handbook for Users, Educational Testing Service, Princeton, New Jersey, 1980.

OCR for page 1
4 While reviews at the institutional (or state) level have proven useful in assessing the relative strengths and weaknesses of individual programs, they have not provided the information required for making national comparisons of graduate programs. Several attempts have been made at such comparisons. The most widely used of these have been the studies by Keniston (1959), Cartter (1966), and Roose and Andersen (1970~. All three studies covered a broad range of disciplines in en- gineering, the humanities, and the sciences and were based on the opin- ions of knowledgeable individuals in the program areas covered. Ken- istonS surveyed the department chairmen at 25 leading institutions. The Cartter6 and Roose-Andersen7 studies compiled ratings from much larger groups of faculty peers. The stated motivation for these stud- ies was to increase knowledge concerning the quality of graduate edu- cation: A number of reasons can be advanced for undertaking such a study. The diversity of the American system of higher education has properly been regarded by both the professional educator and the layman as a great source of strength, since it permits flexibility and adapta- bility and encourages experimentation and competing solutions to common problems. Yet diversity also poses problems. . . . Diversity can be a costly luxury if it is accompanied by ignorance. . . . Just as consumer knowledge and honest advertising are requisite if a competitive economy is to work satisfactorily, so an improved knowledge of opportunities and of quality is desirable if a diverse educational system is to work effectively.8 Although the program ratings from the Cartter and Roose-Andersen stud- ies are highly correlated, some substantial differences in successive ratings can be detected for a small number of programs--suggesting changes in the programs or in the perception of the programs. For the past decade the Roose-Andersen ratings have generally been regarded as the best available source of information on the quality of doctoral programs. Although the ratings are now more than 10 years out of date and have been criticized on a variety of grounds, they are still used extensively by individuals within the academic community and by those in federal and state agencies. 5H. Keniston, Graduate Study in Research in the Arts and Sciences at the University of Pennsylvania, University of Pennsylvania Press, Phil- adelphia, 1959. 6 A. M. Cartter, An Assessment of Quality in Graduate Education, Amer- ican Council on Education, Washington, D.C., 1966. 7K. D. Roose and C. J. Andersen, A Rating of Graduate Programs, Amer- ican Council on Education, Washington, D.C., 1970. Cartter, p. 3.

OCR for page 1
5 A frequently cited criticism of the Cartter and Roose-Andersen studies is their exclusive reliance upon reputational measurement. The ACE rankings are but a small part of all the eval- uative processes, but they are also the most public, and they are clearly based on the narrow assumptions and elitist structures that so dominate the present direction of higher education in the United States. As long as our most prestigious source of information about postsecondary education is a vague popularity contest, the resultant ignorance will continue to pro- vide a cover for the repetitious aping of a single model. . . . All the attempts to change higher educa- tion will ultimately be strangled by the "legitimate" evaluative processes that have already programmed a single set of responses from the start.9 A number of other criticisms have been leveled at reputational rank- ings of graduate programs.~ First, such studies inherently reflect perceptions that may be several years out of date and do not take into account recent changes in a program. Second, the ratings of individ- ual programs are likely to be influenced by the overall reputation of the university--i.e., an institutional "halo effect." Also, a dispro- portionately large fraction of the evaluators are graduates of and/or faculty members in the largest programs, which may bias the survey re- sults. Finally, on the basis of such studies it may not be possible to differentiate among many of the lesser known programs in which rel- atively few faculty members have established national reputations in research. Despite such criticisms several studies based on methodologies similar to that employed by Cartter and Roose and Andersen have been carried out during the past 10 years. Some of these studies evaluated post-baccalaureate programs in areas not covered in the two earlier reports--including business, religion, educational administration, and medicine. Others have focused exclusively on programs in particular disciplines within the sciences and humanities. A few attempts have been made to assess graduate programs in a broad range of disciplines, many of which were covered in the Roose-Andersen and Cartter ratings, but in the opinion of many each has serious deficiencies in the meth- ods and procedures employed. In addition to such studies, a myriad of articles have been written on the assessment of graduate programs since the release of the Roose-Andersen report. With the heightening interest in these evaluations, many in the academic community have recognized the need to assess graduate programs, using other criteria in addition to peer judgment. 9Dolan, p. 81. iFor a discussion of these criticisms, see David S. Webster, "Meth- ods of Assessing Quality," Change, October 1981, pp. 20-24.

OCR for page 1
6 Though carefully done and useful in a number of ways, these ratings (Cartter and Roose-Andersen) have been criticized for their failure to reflect the complexity of graduate programs, their tendency to emphasize the traditional values that are highly related to program size and wealth, and their lack of timeliness or cur- rency. Rather than repeat such ratings, many members of the graduate community have voiced a preference for developing ways to assess the quality of graduate pro- grams that would be more comprehensive, sensitive to the different program purposes, and appropriate for use at any time by individual departments or universi- ties.~ Several attempts have been made to go beyond the reputational assess- ment. Clark, Harnett, and Baird, in a pilot studying of graduate pro- grams in chemistry, history, and psychology, identified as many as 30 possible measures significant for assessing the quality of graduate ed- ucation. Glowers 3 has ranked engineering schools according to the total amount of research spending and the number of graduates listed in Who's Who in Engineering. House and Yeager~4 rated economics de- partments on the basis of the total number of pages published by full professors in 45 leading journals in this discipline. Other ratings based on faculty publication records have been compiled for graduate programs in a variety of disciplines, including political science, psy- chology, and sociology. These and other studies demonstrate the feasi- bility of a national assessment of graduate programs that is founded on more than reputational standing among faculty peers. DEVELOPMENT OF STUDY PLANS In September 1976 the Conference Board, with support from the Car- negie Corporation of New York and the Andrew W. Mellon Foundation, con- vened a three-day meeting to consider whether a study of programs in graduate education should be undertaken. The 40 invited participants in this meeting included academic administrators, faculty members, and ~Clark, p. 1. I'M. J. Clark, R. T. Harnett, and L. L. Baird, Assessing Dimensions of Quality in Doctoral Education: A Technical Report of a National Study in Three Fields Educational Testino Service Princeton New Jersey, 1976. ~3 Donald D. Glower, "A Rational Method for Ranking Engineering Pro- grams," Engineering Education, May 1980. McDonald R. House and James H. Yeager, Jr., "The Distribution of Pub- lication Success Within and Among Top Economics Departments: A Disag- gregate View of Recent Evidence," Economic Inquiry' Vol. 16, No. 4, October 1978, pp. 593-598.

OCR for page 1
7 agency and foundation officials and represented a variety of insti- tutions, disciplines, and convictions. In these discussions there was considerable debate concerning whether the potential benefits of such a study outweighed the possible misrepresentations of the results. On the one hand, "a substantial majority of the Conference [participants believed] that the earlier assessments of graduate education have re- ceived wide and important use: by students and their advisors, by the institutions of higher education as aids to planning and the allocation of educational functions, as a check on unwarranted claims of excel- lence, and in social science research." 6 On the other hand, the Conference participants recognized that a new study assessing the qual- ity of graduate education "would be conducted and received in a very different atmosphere than were the earlier Cartter and Roose-Andersen reports. . . . Where ratings were previously used in deciding where to increase funds and how to balance expanding programs, they might now be used in deciding where to cut off funds and programs." After an extended debate of these issues, it was the recommendation of this conference that a study with particular emphasis on the effec- tiveness of doctoral programs in educating research personnel be under- taken. The recommendation was based principally on four considera- tions: (1) the importance of the study results to national and state bodies, the desire to stimulate continuing emphasis on quality in graduate education, (3) the need for current evaluations that take into account the many changes that have occurred in programs since the Roose-Andersen study, and (4) the value of extending the range of measures used in evaluative studies of graduate programs. Although many participants expressed interest in an assessment of mas- ter's degree and professional degree programs, insurmountable problems prohibited the inclusion of these types of programs in this study. Following this meeting a 13-member committee, 7 co-chaired by Gardner Lindzey and Harriet A. Zuckerman, was formed to develop a de- tailed plan for a study limited to research-doctorate programs and de- signed to improve upon the methodologies utilized in earlier studies. In its deliberations the planning committee carefully considered the criticisms of the Roose-Andersen study and other national assessments. Particular attention was paid to the feasibility of compiling a variety of specific measures {e.g., faculty publication records, quality of students, program resources) that were judged to be related to the quality of research-doctorate programs. Attention was also given to making improvements in the survey instrument and procedures used in the resee Appendix G for a list of the participants in this conference. from a summary of the Woods Hole Conference (see Appendix G). Resee Appendix H for a list of members of the planning committee.

OCR for page 1
8 Cartter and Roose-Andersen studies. In September 1978 the planning group submitted a comprehensive report describing alternative strate- gies for an evaluation of the quality and effectiveness of research- doctorate programs. The proposed study has its own distinctive features. It is characterized by a sharp focus and a multidimen- sional approach. (1) It will focus only on programs awarding research doctorates; other purposes of doc- toral training are acknowledged to be important, but they are outside the scope of the work contemplated. (2) The multidimensional approach represents an ex- plicit recognition of the limitations of studies that make assessments solely in terms of ratings of per- ceived quality provided by peers--the so-called repu- tational ratings. Consequently, a variety of quality- related measures will be employed in the proposed study and will be incorporated in the presentation of the results of the study. This report formed the basis for the decision by the Conference Board to embark on a national assessment of doctorate-level programs in the sciences, engineering, and the humanities. In June 1980 an 18-member committee was appointed to oversee the study. The committee,~9 made up of individuals from a diverse set of disciplines within the sciences, engineering, and the humanities, in- cludes seven members who had been involved in the planning phase and several members who presently serve or have served as graduate deans in either public or private universities. During the first eight months the committee met three times to review plans for the study ac- tivities, make decisions on the selection of disciplines and programs to be covered, and design the survey instruments to be used. Early in the study an effort was made to solicit the views of presidents and graduate deans at more than 250 universities. Their suggestions were most helpful to the committee in drawing up final plans for the assess- ment. With the assistance of the Council of Graduate Schools in the United States, the committee and its staff have tried to keep the grad- uate deans informed about the progress being made in this study. The final section of this chapter describes the procedures followed in de- termining which research-doctorate programs were to be included in the assessment. SELECTION OF DISCIPLINES AND PROGRAMS TO BE EVALUATED One of the most difficult decisions made by the study committee was the selection of disciplines to be covered in the assessment. Early in Nonnational Research Council, A Plan to Study the Quality and Effective- ness of Research-Doctorate Programs, 1978 {unpublished report). ~9See p. vii for a list of members of the study committee.

OCR for page 1
9 the planning stage it was recognized that some important areas of grad- uate education would have to be left out of the study. Limited finan- cial resources required that efforts be concentrated on a total of no more than about 30 disciplines in the biological sciences, engineering, humanities, mathematical and physical sciences, and social and behav- ioral sciences. At its initial meeting the committee decided that the selection of disciplines within each of these five areas should be made primarily on the basis of the total number of doctorates awarded nation- ally in recent years. At the time the study was undertaken, aggregate counts of doctoral degrees earned during the FY1976-78 period were available from two in- dependent sources--the Educational Testing Service (ETS) and the Na- tional Research Council (NRC). Table 1.1 presents doctoral awards data for 10 disciplines within the social and behavioral sciences. As al- luded to in footnote 1 of the table, discrepancies between the ETS and NRC counts may be explained, in part, by differences in the data col- lection procedures. The ETS counts, derived from information provided by universities, have been categorized according to the discipline of the department/academic unit in which the degree was earned. The NRC counts were tabulated from the survey responses of FY1976-78 Ph.D. re- cipients, who had been asked to identify their fields of specialty. Originally the committee had decided to include only the first six so- cial and behavioral science disciplines listed in Table 1.1. However, at the urging of many individuals in the academic community and at the request of the National Science Foundation, which provided supplemental funding, geography was added to the list of social and behavioral science disciplines to be covered in the assessment. Since the deci- sion to include geography was not made until spring 1981, the survey of evaluators in this discipline was not undertaken until five months after the survey in other disciplines. The selection of the research-doctorate programs to be evaluated in each discipline was made in two stages. Programs meeting either of the following criteria were initially nominated for inclusion in the study: {1) more than a specified number (see below) of re- search doctorates awarded during the FY1976-78 period or (2) more than one-third of that specified number of doctorates awarded in FY1979. 2 Geography was among the disciplines covered in the Roose-Andersen study. 2 Fin the first three volumes of the committee's study, which pertain to the mathematical and physical sciences, humanities, and engineer- ing, it is mistakenly reported that a third criterion based on results from the Roose-Andersen study was used in the nomination of programs to be included in the assessment. This third criterion, while at one time considered by the committee, was not adopted.

OCR for page 1
10 TABLE 1.1 Number of Research-Doctorates Awarded in Socia 1 and Behavioral Science Disciplines, FY1976-78 Disciplines Included in the Assessment Psychology History Economics Political Science Sociology Anthropology Geography Total Disciplines Not Included in the Assessment Area Studies Public Administration Urban Studies Other Social and Behavioral Sciences Total Source of Data* ETS NRC 6~977 2~511 2J323 2~021 1~981 1~252 523 17,588 479 413 62 N/A 8,868 2,819 2,524 2,195 2,069 1,290 469 20,234 333 423 247 694 1,697 *Data on FY1976-78 doctoral awards were derived from two independent sources: Educational Testing Service {ETS), Graduate Programs and Ad- missions Manual, 1979-81, and the NRC's Survey of Earned Doctorates, 1976-78. Differences in field definitions account for discrepancies between the ETS and NRC data. In each discipline the specified number of doctorates required for inclusion in the study was determined in such a way that the programs meeting this criterion accounted for at least 90 percent of the doctor- ates awarded in that discipline during the FY1976-78 period. In the social and behavioral science disciplines the following numbers of FY1976-78 doctoral awards were required to satisfy the first criterion (above): Anthropology--9 or more doctorates Economics--12 or more doctorate s Geography--1 or more doctorates History--ll or more doctorates Political Science--10 or more doctorates Psychology--22 or more doctorates Sociology--9 or more doctorates.

OCR for page 1
11 A list of the nominated programs at each institution was then sent to a designated individual (usually the graduate dean) who had been ap- pointed by the university president to serve as study coordinator for the institution. The coordinator was asked to review the list and eliminate any programs no longer offering research doctorates or not belonging in the designated discipline. The coordinator also was given an opportunity to nominate additional programs that he or she believed should be included in the study.2 2 Coordinators were asked to re- strict their nominations to programs that they considered to be "of uncommon distinction" and that had awarded no fewer than two research- doctorates during the past two years. In order to be eligible for in- clusion, of course, programs had to belong in one of the disciplines covered in the study. If the university offered more than one re- search-doctorate program in a discipline, the coordinator was instruc- ted to provide information on each of them so that these programs could be evaluated separately. The committee received excellent cooperation from the study coor- dinators at universities. Of the 243 institutions that were identified as having one or more research-doctorate programs satisfying the cri- teria (listed earlier) for inclusion in the study, only 7 declined to participate in the study and another 8 failed to provide the program information requested within the three-month period allotted (despite several reminders). None of these 15 institutions had doctoral pro- grams that had received strong or distinguished reputational ratings in prior national studies. Since the information requested had not been provided, the committee decided not to include programs from these in- stitutions in any aspect of the assessment. In each of the seven chap- ters that follows, a list is given of the universities that met the criteria for inclusion in a particular discipline but that are not rep- resented in the study. As a result of nominations by institutional coordinators, some pro- grams were added to the original list and others dropped. Table 1.2 reports the final coverage in each of the seven social and behavioral science disciplines. The number of programs evaluated varies consider- ably by discipline. A total of 150 psychology programs have been in- cluded in the study; in geography and anthropology fewer than half this number have been included. Although the final determination of whether a program should be included in the assessment was left in the hands of the institutional coordinator, it is entirely possible that a few pro- grams meeting the criteria for inclusion in the assessment were over- looked by the coordinators. In the chapter that follows, a detailed description is given of each of the measures used in the evaluation of research-doctorate programs in the social and behavioral sciences. The description includes a discussion of the rationale for using the mea- sure, the source from which data for that measure were derived, and any known limitations that would affect the interpretation of the data re- 2 2 See Appendix A for the specific instructions given to the coordi- nators.

OCR for page 1
12 ported. The committee wishes to emphasize that there are limitations associated with each of the measures and that none of the measures should be regarded as a precise indicator of the quality of a program in educating scientists for careers in research. The reader is strongly urged to consider the descriptive material presented in Chap- ter II before attempting to interpret the program evaluations reported in subsequent chapters. In presenting a frank discussion of any short- comings of each measure, the committee's intent is to reduce the possi- bility of misuse of the results from this assessment of research-doc- torate programs. TABLE 1.2 Number of Programs Evaluated in Each Discipline and the Total FY1976-80 Doctoral Awards from These Programs Discipline Programs FY1976-80 Doctorates* Anthropology 70 1,960 Economics 93 3,770 Geography 49 762 History 102 3,877 Political Science 83 2,909 Psychology 150 10,582 Sociology 92 3,061 TOTAL 639 26,921 *The data on doctoral awards were provided by the study coordinator at each of the universities covered in the assessment.