Read "An Assessment of Research-Doctorate Programs in the United States: Biological Sciences" at NAP.edu

« Previous: I. Origins of Study and Selection of Programs

Page 15 Cite

Suggested Citation:"II. Methodology." National Research Council. 1982. An Assessment of Research-Doctorate Programs in the United States: Biological Sciences. Washington, DC: The National Academies Press. doi: 10.17226/9779.

Page 16 Cite

Page 17 Cite

Page 18 Cite

Page 19 Cite

Page 20 Cite

Page 21 Cite

Page 22 Cite

Page 23 Cite

Page 24 Cite

Page 25 Cite

Page 26 Cite

Page 27 Cite

Page 28 Cite

Page 29 Cite

Page 30 Cite

Page 31 Cite

Page 32 Cite

Page 33 Cite

Page 34 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Methodology . . you know what it is, yet you don't know But that's self-contradictory. - Quality . . what it is. But that's self-contradictory. But some things are better than others, that is, they have more quality. But when you try to say what the quality is, apart from the things that have it, it all goes poof' vou can't say There's nothing to talk about. But if _ what Quality is, how do you know what it is, or how do you know that it even exists? If no one knows what it is, then for all practical purposes it doesn't exist at all. But for all practical purposes it really does exist. What else are the grades based on? Why else would people pay fortunes for some things and throw others in the trash pile? Obviously some things are better than others . . . but what's the "bitterness"? . . . So round and round you go, spinning mental wheels and nowhere finding anyplace to get traction. What the hell is Quality? What is it? . Robert M. Pirsig Zen and the Art of Motorcycle Maintenance Both the planning committee and our own study committee have given careful consideration to the types of measures to be employed in the assessment of research-doctorate programs. The committees recog- nized that any of the measures that might be used is open to criticism and that no single measure could be expected to provide an entirely satisfactory index of the quality of graduate education. With respect to the use of multiple criteria in educational assessment, one critic has commented: PA description of the measures considered may be found in the third chapter of the planning committee's report, along with a discussion of the relative merits of each measure. 15

16 At best each is a partial measure encompassing a frac- tion of the large concept. On occasion its I-ink to the real [world] is problematic and tenuous. Moreover, each measure [may contain] a load of irrelevant super- fluities, "extra baggage" unrelated to the outcomes under study. By the use of a number of such measures, each contributing a different facet of information, we can limit the effect of irrelevancies and develop a more rounded and truer picture of program outcomes.2 Although the use of multiple measures alleviates the criticisms directed at a single dimension or measure, it certainly will not sat- isfy those who believe that the quality of graduate programs cannot be represented by quantitative estimates no matter how many dimensions they may be intended to represent. Furthermore, the usefulness of the assessment is dependent on the validity and reliability of the criteria on which programs are evaluated. The decision concerning which mea- sures to adopt in the study was made primarily on the basis of two fac- tors: (1) the extent to which a measure was judged to be related to the quality of research-doctorate pro- grams and (2) the feasibility of compiling reliable data for making national comparisons of programs in par- ticular disciplines. Only measures that were applicable to a majority of the disciplines to be covered were considered. In reaching a final decision the study committee found the ETS study,3 in which 27 separate variables were examined, especially helpful, even though it was recognized that many of the measures feasible in institutional self-studies would not be available in a national study. The committee was aided by the many suggestions received from university administrators and others within the academic community. Although the initial design called for an assessment based on ap- proximately six measures, the committee concluded that it would be highly desirable to expand this effort. AS many as 16 measures (listed in Table 2.1) have been utilized in the assessment of research-doctor- ate programs in biochemistry, botany, cellular/molecular biology, microbiology, physiology, and zoology. For nine of the measures data are available describing most, if not all, of the biological science programs included in the assessment. For seven measures the coverage is less complete but encompasses a large fraction of the programs in 2C. H. Weiss, Evaluation Research: Methods of Assessinq Program Ef- , festiveness, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1972, p. 56. 3 See M. J. Clark et al. {1976) for a description of these variables.

17 TABLE 2.1 Measures Compiled on Individual Research-Doctorate Programs in the Biological Sciences Program Size1 01 Reported number of faculty members in the program, December 1980. 02 Reported number of program graduates in last five years (July 1975 through June 1980). 03 Reported total number of full-time and part-time graduate students enrolled in the program who intend to earn doctorates, December 1980. Characteristics of Graduates2 04 Fraction of FY1975-79 program graduates who had received some national fellowship or training grant support during their graduate education. 05 Median number of years from first enrollment in graduate school to receipt of the doctorate--FY1975-79 program graduates. 3 06 Fraction of FY1975-79 program graduates who at the time they completed requirements for the doctorate reported that they had made definite commitments for postgraduation employment. 07 Fraction of FY1975-79 program graduates who at the time they completed requirements for the doctorate reported that they had made definite com- mitments for postgraduation employment in Ph.D.-granting universities. Reputational Survey Results4 08 Mean rating of the scholarly quality of program faculty. 09 Mean rating of the effectiveness of the program in educating research scholars/scientists. 10 Mean rating of the improvement in program quality in the last five years. 11 Mean rating of the evaluators' familiarity with the work of the program's faculty. University Library Sizes 12 Composite index describing the library size in the university in which the program is located, 1979-80. Research Support 13 Fraction of program faculty members holding research grants from the National Institutes of Health, the National Science Foundation, or the Alcohol, Drug Abuse, and Mental Health Administration at any time during the FY1978-80 period. 6 14 Total expenditures (in thousands of dollars) reported by the university for research and development activities in a specified field, FY1979. 7 Publication Records8 15 Number of published articles attributed to the program, 1978-79. 16 Estimated coverall influence" of published articles attributed to the program, 1978-79. Based on information provided to the committee by the participating universities. 2Based on data compiled in the NRC's Survey of Earned Doctorates. 3 In reporting standardized scores and correlations with other variables, a shorter time-to-Ph.D. is assigned a higher score. 4 Based on responses to the committee's survey conducted in April 1981. 5 Based on data compiled by the Association of Research Libraries. 6 Based on matching faculty names provided by institutional coordinators with the names of research grant awardees from the three federal agencies. 7 Based on data provided to the National Science Foundation by universities. Based on data compiled by the Institute for Scientific Information and developed by Computer Horizons, Inc.

18 every discipline. The actual number of programs evaluated on every measure is reported in the second table in each of the--next six chap- ters. m e 16 measures describe a variety of aspects important to the op- eration and function of research-doctorate programs--and thus are rele- vant to the quality and effectiveness of programs in educating scien- tists for careers in research. However, not all of the measures may be viewed as "global indices of quality." Some, such as those relating to program size, are best characterized as "program descriptors" that, although not dimensions of quality per se, are thought to have a sig- nificant influence on the effectiveness of programs. Other measures, such as those relating to university library size and support for re- search and training, describe some of the resources generally recog- nized as being important in maintaining a vibrant program in graduate education. Measures derived from surveys of faculty peers or from the publication records of faculty members, on the other hand, have tradi- tionally been regarded as indices of the overall quality of graduate programs. Yet these too are not true measures of quality. We often settle for an easy-to-gather statistic, per- fectly legitimate for its own limited purposes, and then forget that we haven't measured what we want to talk about. Consider, for instance, the reputation approach of ranking graduate departments: We ask a sample of physics professors (say) which the best physics departments are and then tabulate and report the results. The "best" departments are those that our respondents say are the best. Clearly it is useful to know which are the highly regarded departments in a given field, but prestige (which is what we are mea- suring here) isn't exactly the same as quality.4 To be sure, each of the 16 measures reported in this assessment has its own set of limitations. In the sections that follow an explanation is provided of how each measure has been derived and its particular limi- tations as a descriptor of research-doctorate programs. PROGRAM SIZE Information was collected from the study coordinators at each uni- versity on the names and ranks of program faculty, doctoral student enrollment, and number of Ph.D. graduates in each of the past five years (FY1976-807. Each coordinator was instructed to include on the faculty list those individuals who, as of December 1, 1980, held academic appointments Atypically at the rank of assistant, associate, 4John Shelton Reed, "How Not To Measure What a University Does," The Chronicle of Higher Education, Vol. 22, No. 12, May 11, 1981, p. 56.

19 and full professor) and who participated significantly in doctoral education. Emeritus and adjunct members generally were not to=be in- cluded. Measure 01 represents the number of faculty identified in a program. Measure 02 is the reported number of graduates who earned Ph.D. or equivalent research doctorates in a program during the period from July 1, 1975, through June 30, 1980. Measure 03 represents the total number of full-time and part-time students reported to be en- rolled in a program in the fall of 1980, who intended to earn research doctorates. All three of these measures describe different aspects of program size. In previous studies program size has been shown to be highly correlated with the reputational ratings of a program, and this relationship is examined in detail in this report. It should be noted that since the information was provided by the institutions partici- pating in the study, the data may be influenced by the subjective de- cisions made by the individuals completing the forms. For example, some institutional coordinators may be far less restrictive than others in deciding who should be included on the list of program faculty. To minimize variation in interpretation, detailed instructions were pro- vided to those filling out the forms.S Measure 03 is of particular concern in this regard since the coordinators at some institutions may not have known how many of the students currently enrolled in graduate study intended to earn doctoral degrees. CHARACTERISTICS OF GRADUATES One of the most meaningful measures of the success of a research- doctorate program is the performance of its graduates. How many go on to lead productive careers in research and/or teaching? Unfortunately, reliable information on the subsequent employment and career achieve- ments of the graduates of individual programs is not available. In the absence of this directly relevant information, the committee has relied on four indirect measures derived from data compiled in the NRC's Sur- vey of Earned Doctorates.6 Although each measure has serious limita- tions (described below), the committee believes it more desirable to include this information than not to include data about program gradu- ates. In identifying program graduates who had received their doctorates in the previous five years (FY1975-79) ,7 the faculty lists furnished by the study coordinators at universities were compared with the names of dissertation advisers (available from the NRC survey). The latter source contains records for virtually all individuals who have earned sA copy of the survey form with the instructions sent to study coor- dinators is included in Appendix A. 6A copy of the questionnaire used in this survey is found in Appen- dix B. 7Survey data for the FY1980 Ph.D. recipients had not yet been com- piled at the time this assessment was undertaken.

20 research doctorates from U.S. universities since 1920. me institu- tion, year, and specialty field of Ph.D. recipients were also used in determining the identity of program graduates. It is estimated that this matching process provided information on the graduate training and employment plans of more than 80 percent of the FY1975-79 graduates from the biological science programs. In the calculation of each of the four measures derived from the NRC survey, program data are re- ported only if the survey information is available on at least 10 graduates. Consequently, in a discipline with smaller programs-- physiology--slightly less than half the programs are included in these measures, whereas more than 80 percent of the zoology programs are included. Measure 04 constitutes the fraction of FY1975-79 graduates of a program who had received at least some national fellowship support, including National Institutes of Health fellowships or traineeships, National Science Foundation fellowships, other federal fellowships, Woodrow Wilson fellowships, or fellowships/traineeships from other U.S. national organizations. One might expect the more selective programs to have a greater proportion of students with national fellowship sup- port--especially "portable fellowships." Although the committee con- sidered alternative measures of student ability (e.g., Graduate Record Examination scores, undergraduate grade point averages), reliable in- formation of this sort was unavailable for a national assessment. It should be noted that the relevance of the fellowship measure varies considerably among disciplines. In the biomedical sciences a substan- tial fraction of the graduate students are supported by training grants and fellowships; in most other sciences and engineering disciplines a majority are supported by research assistantships and teaching assis- tantships. Even in the biological disciplines, however, differences in the patterns of graduate student support at different universities may sharply affect measure 04. Some departments with sizable undergradu- ate enrollments, for example, may have large numbers of teaching as- sistantships paid out of state or institutional funds--thereby reducing the need for federal training grant or fellowship support. Similarly, some departments may have an established policy of supporting graduate students as research assistants rather than seeking federal training grant support for their students. Measure 05 is the median number of years elapsed from the time program graduates first enrolled in graduate school to the time they received their doctoral degrees. For purposes of analysis the commit- tee has adopted the conventional wisdom that the most talented students are likely to earn their doctoral degrees in the shortest periods of time--hence, the shorter the median time-to-Ph.D., the higher the standardized score that is assigned. Although this measure has fre- quently been employed in social science research as a proxy for student ability, one must regard its use here with some skepticism. It is quite possible that the length of time it takes a student to complete requirements for a doctorate may be significantly affected by the ex- plicit or implicit policies of a university or department. For exam- ple, in certain cases a short time-to-Ph.D. may be indicative of less

21 stringent requirements for the degree. Furthermore, previous studies have demonstrated that women and members of minority groups, for rea- sons having nothing to do with their abilities, are more likely than male Caucasians to interrupt their graduate education or to be enrolled on a part-time basis.8 As a consequence, the median time-to-Ph.D. may be longer for programs with larger fractions of women and minority students. Measure 06 represents the fraction of FY1975-79 program graduates who reported at the time they had completed requirements for the doc- torate that they had signed contracts or made firm commitments for postgraduation employment {including postdoctoral appointments as well as other positions in the academic or nonacademic sectors) and who provided the names of their prospective employers. Although this measure is likely to vary discipline by discipline according to the availability of employment opportunities, a program's standing relative to other programs in the same discipline should not be affected by this variation. In theory, the graduates with the greatest promise should have the easiest time in finding jobs. However, the measure is also influenced by a variety of other factors, such as personal job prefer- ences and restrictions in geographic mobility, that are unrelated to the ability of the individual. It also should be noted parenthetically that unemployment rates for doctoral recipients are quite low and that nearly all of the graduates seeking jobs find positions soon after completing their doctoral programs.9 Furthermore, first employment after graduation is by no means a measure of career achievement, which is what one would like to have if reliable data were available. Measure-07, a variant of measure 06, constitutes the fraction of FY1975-79 program graduates who indicated that they had made firm com- mitments for employment in Ph.D.-granting universities and who provided the names of their prospective employers. m is measure may be presumed to be an indication of the fraction of graduates likely to pursue ca- reers in academic research, although there is no evidence concerning how many of them remain in academic research in the long term. In many science disciplines the path from Ph.D. to postdoctoral apprenticeship to junior faculty has traditionally been regarded as the road of suc- cess for the growth and development of research ~1 and Filth" Emmy t~l-- ~s we'' aware, of course, that other paths, such as employment in the major laboratories of industry and government, provide equally attrac- tive opportunities for growth. Indeed, in recent years increasing numbers of graduates are entering the nonacademic sectors. Unfortu- nately the data compiled from the NRC's Survey of Earned Doctorates do For a detailed analysis of this subject, see Dorothy M. Gilford and Joan Snyder, Women and Minority Ph.D.'s in the 1970's: A Data Book, National Academy of Sciences, Washington, D.C., 1977. 9For new Ph.D. recipients in science and engineering the unemployment rate has been less than 2 percent (see National Research Council, Postdoctoral Appointments and Disappointments, National Academy Press, Washington, D.C., 1981, p. 313) .

22 TABLE 2.2 Percentage of FY1975-79 Doctoral Recipients with Definite Commitments for Employment Outside the Academic Sector* Biochemistry Botany Cellular/Molecular Biology Microbiology Physiology Zoology 22 28 22 31 16 24 *Percentages are based on responses to the NRC's Survey of Earned Doc- torates by those who indicated that they had made firm commitments for postgraduation employment and who provided the names of their prospec- tive employers. These percentages may be considered to be lower-bound estimates of the actual percentages of doctoral recipients employed outside the academic sector. not enable one to distinguish between employment in the top-flight laboratories of industry and government and employment in other areas of the nonacademic sectors. Accordingly, the committee has relied on a measure that reflects only the academic side and views this measure as a useful and interesting program characteristic rather than a di- mension of quality. In the biological science disciplines, in which only about one-fourth of the graduates take jobs outside the academic environs (see Table 2.2), this limitation is not as serious a concern as it is in the engineering and physical science disciplines. The inclusion of measures 06 and 07 in this assessment has been an issue much debated by members of the committee; the strenuous objec- tions by three committee members regarding the use of these measures are expressed in the Minority Statement, which follows Chapter IX. REPUTATIONAL SURVEY RESULTS In April 1981 survey forms were mailed to a total of 1,848 faculty members in biochemistry, botany, cellular/molecular biology, microbi- ology, physiology, and zoology. The evaluators were selected from the faculty lists furnished by the study coordinators at the 228 universi- ties covered in the assessment. These evaluators constituted approxi- mately IS percent of the total faculty population--12,167 faculty mem- bers--in the biological science programs being evaluated (see Table 2.3~. The survey sample was chosen on the basis of the number of fac- ulty in a particular program and the number of doctorates awarded in the previous five years (FY1976-80~--with the stipulation that at least one evaluator was selected from every program covered in the assess- ment. In selecting the sample each faculty rank was represented in proportion to the total number of individuals holding that rank, and preference was given to those faculty members whom the study coordina- tors had nominated to serve as evaluators. As shown in Table 2.3,

23 1,485 individuals, 80 percent of the survey sample in the biological sciences, had been recommended by study coordinators.~° Each evaluator was asked to consider a stratified random sample of 50 research-doctorate programs in his or her discipline--with programs stratified by the number of faculty members associated with each pro- gram. Every program was included on 150 survey forms. The 50 programs to be evaluated appeared on each survey form in random sequence, pre- ceded by an alphabetized list of all programs in that discipline that were being included in the study. No evaluator was asked to consider a program at his or her own institution. Ninety percent of the survey sample group were provided the names of faculty members in each of the 50 programs to be evaluated, along with data on the total number of doctorates awarded in the last five years. The inclusion of this information represents a significant departure from the procedures used in earlier reputational assessments. For purposes of comparison with previous studies, 10 percent (randomly selected in each discipline) were not furnished any information other than the names of the pro- grams. The survey items were adapted from the form used in the Roose- Andersen study. Prior to mailing, the instrument was pretested using a small sample of faculty members in chemistry and psychology. As a result, two significant improvements were made in the original survey design. A question was added on the extent to which the evaluator was familiar with the work of the faculty in each program. Responses to this question, reported as measure 11, provide some insight into the relationship between faculty recognition and the reputational standing of a program.) 2 Also added was a question on the evaluator 'subfield of specialization--thereby making it possible to compare program evalu- ations in different specialty areas within a particular discipline. A total of 1,026 faculty members in the biological sciences--56 percent of those asked to participate--completed and returned survey forms (see Table 2.3~. Two factors probably have contributed to this response rate being approximately 15 percentage points below the rates reported in the Cartter and Roose-Andersen studies. First, because of the considerable expense of printing individualized survey forms (each 25-30 pages), second copies were not sent to sample members not re- sponding to the first mailingi3--as was done in the Cartter and Roose-Andersen efforts. Second, it is quite apparent that within the MA detailed analysis of the survey participants in each discipline is given in subsequent chapters. iiThis information was furnished to the committee by the study coor- dinators at the universities participating in the study. Evidence of the strength of the relationship is provided by corre- lations presented in Chapters III-VIII, and an analysis of the rela- tionship is provided in Chapter IX. MA follow-up letter was sent to those not responding to the first mailing, and a second copy was distributed to those few evaluators who specifically requested another form.

24 TABLE 2.3 Survey Response by Discipline and Characteristics of Evaluator Total Program Faculty N Discinline of Evaluator Survey Sample Total Respondents N N ~ Biochemistry 2,658 417 234 56 Botany 1,589 249 153 61 Cellular/Molecular Biology 2,271 267 139 52 Microbiology 2,195 402 231 58 Physiology 1,964 303 146 48 Zoology 1, 490 210 123 59 Faculty Rank Professor 6,188 956 510 53 Associate Professor 3,334 597 334 56 Assistant Professor 2,389 292 181 62 Other 256 3 1 33 Evaluator Selection Nominated by Institution 3, 467 1,485 856 58 Other 8,700 363 170 47 Survey Form With Faculty Names N/A* 1,662 935 56 Without Names N/A* 186 91 49 Total All Fields 12,167 1,848 1,026 56 *Not applicable. with educational J ~ ~ ~ academic community there has been a growing dissatisfaction in recent years assessments based on reputational measures. Indeed, this u.~ac~sracc~on was an important factor in the Conference Board's decision to undertake a multidimensional assessment, and some faculty members included in the sample made known to the committee their strong objections to the reputational survey. As can be seen in Table 2~3, there is some variation in the response rates in the six biological science disciplines. Of particular interest

25 is the relatively high rate of response from botanists and the low rate from physiologists--a result consistent with the findings in the Cartter and Roose-Andersen surveys. 4 It is not surprising to find that the evaluators nominated by study coordinators responded more often than did those who had been selected at random. No appreciable differences were found among the response rates of assistant, associ- ate, and full professors or between the rates of those evaluators who were furnished the abbreviated survey form (without lists of program faculty) and those who were given the longer version. Each program was considered by an average of approximately 85 survey respondents from other programs in the same discipline. The evaluators were asked to judge programs in terms of scholarly quality of program faculty, effectiveness of the program in educating research scholars/scientists, and change in program quality in the last five years.is The mean ratings of a program on these three survey items constitute measures 08, 09, and 10. Evaluators were also asked to in- dicate the extent to which they were familiar with the work of the program faculty. The average of responses to this item constitutes measure 11. In making judgments about the quality of faculty, evaluators were instructed to consider the scholarly competence and achievements of the individuals. The ratings were furnished on the following scale: 5 Distinguished 4 Strong 3 Good 2 Adequate 1 Marginal O Not sufficient for doctoral education X Don't know well enough to evaluate In assessing the effectiveness of a program, evaluators were asked to consider the accessibility of faculty, the curricula, the instructional and research facilities, the quality of the graduate students, the performance of graduates, and other factors that contribute to a pro- gram's effectiveness. This measure was rated accordingly: 3 Extremely effective 2 Reasonably effective 1 Minimally effective O Not effective X Don't know well enough to evaluate ~ 4 To compare the response rates obtained in the earlier surveys, see Roose and Andersen, Table 28, p. 29. USA copy of the survey instrument with its accompanying instructions is included in Appendix C.

26 Evaluators were instructed to assess change in program quality on the basis of whether there has been improvement in the last--five years in both the scholarly quality of faculty and the effectiveness in educat- ing research scholars/scientists. The following alternatives were provided: 2 Better than five years ago l Little or no change in last five years O Poorer than five years ago X Don't know well enough to evaluate Evaluators were asked to indicate their familiarity with the work of the program faculty according to the following scale: 2 Considerable familiarity l Some familiarity O Little or no familiarity In the computation of mean ratings on measures 08, 09, and lo, the "don't know" responses were ignored. An average program rating based on fewer than 15 responses (excluding the "don't know" responses) is not reported. Measures 08, 09, and 10 are subject to many of the same criticisms that have been directed at previous reputational surveys. Although care has been taken to improve the sampling design and to provide evaluators with some essential information about each program, the survey results merely reflect a consensus of faculty opinions. As discussed in Chapter I, these opinions may well be based on out of date information or be influenced by a variety of factors unrelated to the quality of the program. In Chapter IX a number of factors that may possibly affect the survey results are examined. In addition to these limitations, it should be pointed out that evaluators, on the average, were unfamiliar with almost half of the programs they were asked to consider. 6 As might be expected, the smaller and less prestigious programs were not as well known, and for this reason one might have less confidence in the average ratings of these programs. For all four survey measures, standard errors of the mean ratings are reported; they tend to be larder for the lesser known nr^mr~mc she f'=~..~`, ~F I_ sponse to each of the survey items is discussed in Chapter IX. Two additional comments should be made regarding the survey activ- ity. First, it should be emphasized that the ratings derived from the survey reflect a program's standing relative to other programs in the same discipline and provide no basis for making cross-disciplinary comparisons. For example, the fact that a larger number of microbiol- ogy programs received "distinguished" ratings on measure 08 than did zoology programs indicates nothing about the relative quality of fac- ulty in these two disciplines. It may depend, in part, on the total numbers of programs evaluated in these disciplines; in the survey in- ~ 6 See Table 9.6 in Chapter IX.

27 structions it was suggested to evaluators that no more than 10 percent of the programs listed be designated as "distinguished. n Nor is it advisable to compare the rating of a program in one discipline with that of a program in another discipline because the ratings are based on the opinions of different groups of evaluators who were asked to judge entirely different sets of programs. Second, early in the com- mittee's deliberations a decision was made to supplement the ratings obtained from faculty members with ratings from evaluators who hold research-oriented positions in institutions outside the academic sec- tor. These institutions include industrial research laboratories, government research laboratories, and a variety of other research establishments. Over the past 10 years increasing numbers of doctoral recipients have taken positions outside the academic setting. m e ex- tensive involvement of these graduates in nonacademic employment is reflected in the percentages reported in Table 2.2: An average of 24 percent of the recent graduates in the biological science disciplines who had definite employment plans indicated that they planned to take positions in nonacademic settings. Data from another NRC survey sug- gest that the actual fraction employed outside academia may be signif- icantly higher. The committee recognized that the inclusion of non- academic evaluators would furnish information valuable for assessing nontraditional dimensions of doctoral education and would provide an important new measure not assessed in earlier studies. Results from a survey of this group would provide an interesting comparison with the results obtained from the survey of faculty members. A concentrated effort was made to obtain supplemental funding for adding nonacademic evaluators in selected disciplines to the survey sample, but this ef- fort was unsuccessful. The committee nevertheless remains convinced of the importance of including evaluators from nonacademic research institutions. These institutions are likely to employ increasing numbers of graduates in many disciplines, and it is urged that this group not be overlooked in future assessments of graduate programs. UNIVERSITY LIBRARY SI ZE University library holdings are generally regarded as an important resource for students in graduate (and undergraduate) education. The Association of Research Libraries (ARL) has compiled data from its academic member institutions and developed a composite measure of a university library's size relative to those of other ARL members. The ARL Library Index, as it is called, is based on 10 characteristics: volumes held, volumes added (gross), microform units held, current serials received, expenditures for library materials, expenditures for binding, total salary and wage expenditures, other operating expendi- tures, number of professional staff, and number of nonprofessional staff. 7 me 1979-80 index, which constitutes measure 12, is avail- able for 89 of the 228 universities included in the assessment. (These 7 See Appendix D for a description of the calculation of this index.

28 89 tend to be among the largest institutions.) m e limited coverage of this measure is a major shortcoming. It should be noted that the ARL index is a composite description of library size and not a ~uali- tative evaluation of the collections, services, or operations of the library. Also, it is a measure of aggregate size and does not take into account the library holdings in a particular department or disci- pline. Finally, although universities with more than one campus were instructed to include figures for the main campus only, some in fact may have reported library size for the entire university system. Whether this misreporting occurred is not known. RESEARCH SUPPORT Using computerized data filed provided by the National Science Foundation (NSF) and the National Institutes of Health (NIH), it was possible to identify which faculty members in each program had been awarded research grants during the FY1978-80 period by either of these agencies or by the Alcohol, Drug Abuse, and Mental Health Administra- tion (ADAMHA). The fraction of faculty members in a program who had received any research grants from these agencies during this three-year period constitutes measure 13. Since these awards have been made on the basis of peer judgment, this measure is considered to reflect the perceived research competence of program faculty. However, it should be noted that significant amounts of support for research in the bio- logical sciences come from other federal agencies as well, but it was not feasible to compile data from these other sources. Kit is esti- matedi9 that 57 percent of the university faculty members in these disciplines who received federal R&D funding obtained their support from NIH and another 24 percent from NSF. m e remaining 19 percent received support from the U.S. Department of Agriculture and other federal agencies. It should be pointed out that only those faculty members who served as principal investigators or coinvestigators are counted in the computation of this measure. Measure 14 describes the total FY1979 expenditures by a university for R&D in the biological disciplines. These data have been furnished to the NS~° by universities and include expenditures of funds from both federal and nonfederal sources. If an institution has more than one program being evaluated in the same discipline, the aggregate uni- versity expenditures for research in that discipline are reported for each of the programs. In each discipline data are recorded for the 100 universities with the largest R&D expenditures. Unfortunately, these data are available only for aggregate expenditures in the biological IDA description of these files is provided in Appendix E. ~9 Based on special tabulations of data from the NRC's Survey of Doctorate Recipients, 1979. 20A copy of the survey instrument used to collect these data appears in Appendix E.

29 sciences and are not for expenditures in the individual biological disciplines; thus the value reported for an individual program repre- sents the total university expenditures in the biological sciences. This measure has several other limitations related to the proce- dures by which the data have been collected. The committee notes that there is evidence within the source documental that universities use different practices for categorizing and reporting expenditures. Ap- parently, institutional support of research, industrial support of re- search, and expenditure of indirect costs are reported by different institutions in different categories (or not reported at all). Since measure 14 is based on total expenditures from all sources, the data used here are perturbed only when these types of expenditures are not subsumed under any reporting category. In contrast with measure 13, measure 14 is not reported on a scale relative to the number of faculty members and thus reflects the overall level of research activity at an institution in a particular discipline. Although research grants in the sciences and engineering provide some support for graduate students as well, these measures should not be confused with measure 04, which pertains to fellowships and training grants. PUBLICATION RECORDS Data from the 1978 and the 1979 Science Citation Index have been compiled22 on published articles associated with research-doctorate programs in the biological sciences. Publication counts were associ- ated with programs on the basis of the discipline of the journal in which an article appeared and the institution with which the author was affiliated. Coauthored articles were proportionately attributed to the institutions of the individual authors. Articles appearing in multi- disciplinary journals {e.g., Science, Nature) were apportioned accord- ing to the characteristic mix of subject matter in those journals. For the purposes of assigning publication counts, this mix can be estimated with reasonable accuracy.2 3 Two measures have been derived from the publication records: measure 15--the total number of articles published in the 1978-79 period that have been associated with a research-doctorate program; and measure 16--an estimation of the "influence" of these articles. The latter is a product of the number of articles attributed to a program 2 National Science Foundation, Academic Science: R and D Funds, Fiscal Year 1979, Government Printing Office, Washington, D.C., NSF 81-301, 1981. 2 2 The publication data have been generated for the committee's use by Computer Horizons, Inc., using source files provided by the Insti- tute for Scientific Information. 2 3Francis Narin, Evaluative Bibliometrics: The Use of Publications . and Citations Analysis in the Evaluation of Scientific Activity, Report to the National Science Foundation, March 1976, p. 203.

30 and the estimated influence of the journals in which these articles appeared. The influence of a journal is determined from the weighted number of times, on the average, an article in that journal is cited-- with references from frequently cited journals counting more heavily. A more detailed explanation of the derivation of these measures is given in Appendix F. Neither measure 15 nor measure 16 is based on actual counts of articles written only by program faculty. However, extensive analysis of the "influence" index in the fields of physics, chemistry, and biochemistry has demonstrated the stability of this in- dex and the reliability associated with its use.2 4 Of course, this does not imply that the measure captures subtle aspects of publication "influence." It is of interest to note that indices similar to mea- sures 15 and 16 have been shown to be highly correlated with the peer ratings of graduate departments compiled in the Roose-Andersen study. It must be emphasized that these measures encompass articles (pub- lished in selected journals) by all authors affiliated with a given university. Included therefore are articles by program faculty mem- bers, students and research personnel, and even members of other de- partments in that university who publish in those journals. Moreover, these measures do not take into account the differing sizes of pro- grams, and the measures clearly do depend on faculty size. Although consideration was given to reporting the number of published articles per faculty member, the committee concluded that since the measure in- cluded articles by other individuals besides program faculty members, the aggregate number of articles would be a more reliable measure of overaI1 program quality. It should be noted that if a university had more than one program being evaluated in the same discipline, it was not possible to distinguish the relative contribution of each program. In such cases the aggregate university data in that discipline were as- signed to each program. Since the data are confined to 1978-79, they do not take into ac- count institutional mobility of authors after that period. Thus, ar- ticles by authors who have moved from one institution to another since 1979 are credited to the former institution. Also, the publication counts fail to include the contributions of faculty members' publica- tions in journals outside their primary discipline. This point may be especially important for those programs with faculty members whose re- search is at the intersection of several different disciplines. The reader should be aware of two additional caveats with regard to the interpretation of measures 15 and 16. First, both measures are based on counts of published articles and do not include books. Since 24Narin, pp. 283-307e 2 5 Richard C. Anderson, Francis Narin, and Paul McAllister, "Publica- tion Ratings Versus Peer Ratings of Universities," Journal of the American Society for Information Science, March 1978, pp. 91-103; and Lyle V. Jones, "The Assessment of Scholarship, n New Directions for Program Evaluation, No. 6, 1980, pp. 1-20.

31 in the biological sciences most scholarly contributions are published as journal articles, this may not be a serious limitation. Second, the "influence" measure should not be interpreted as an indicator of the impact of articles by individual authors. Rather it is a measure of the impact of the journals in which articles associated with a partic- ular program have been published. Citation counts, with all their dif- ficulties, would have been preferable since they are attributable to individual authors and they register the impact of books as well as journal articles. However, the difficulty and cost of assembling reli- able counts of articles by individual authors made their use infeas- ible. ANALYSIS AND PRESENTATION OF THE DATA The next six chapters present all of the information that has been compiled on individual research-doctorate programs in biochemistry, botany, cellular/molecular biology, microbiology, physiology, and zo- ology. Each chapter follows a similar format, designed to assist the reader in the interpretation of program data. m e first table in a chapter provides a list of the programs evaluated in a discipline-- including the names of the universities and departments or academic units in which programs reside--along with the full set of data com- piled for individual programs. Programs are listed alphabetically ac- cording to name of institution, and both raw and standardized values are given for all but one measure.2 6 For the reader's convenience an insert of information from Table 2.1 is provided that identifies each of the 16 measures reported in the table and indicates the raw scale used in reporting values for a particular measure. Standardized values, converted from raw values to have a mean of 50 and a standard deviation of 10,27 are computed for every measure so that comparisons can easily be made of a program's relative standing on different mea- sures. Emus, a standardized value of 30 corresponds with a raw value that is two standard deviations below the mean for that measure, and a standardized value of 70 represents a raw value two standard devia- tions above the mean. While the reporting of values in standardized form is convenient for comparing a particular program's standing on different measures, it may be misleading in interpreting actual differ- ences in the values reported for two or more programs--especially when the distribution of the measure being examined is highly skewed. For example, the numbers of published articles (measure 15) associated with four biochemistry programs are reported in Table 3.1 as follows: 2 6 Since the scale used to compute measure 16--the estimated "influ- ence" of published articles--is entirely arbitrary, only standardized values are reported for this measure. 27The conversion was made from the precise raw value rather than from the rounded value reported for each program. Thus, two programs may have the same reported raw value for a particular measure but different standardized values.

32 Program Raw Value Standardized Value , A 1 40 B 6 40 C 21 42 D 38 44 Although programs C and D have many times the number of articles as have programs A and B. the differences reported on a standardized scale appear to be small. Thus, the reader is urged to take note of the raw values before attempting to interpret differences in the standardized values given for two or more programs. The initial table in each chapter also presents estimated standard errors of mean ratings derived from the four survey items (measures 08-111. A standard error is an estimated standard deviation of the sample mean rating and may be used to assess the stability of a mean rating reported for a particular program. 2 ~ For example, one may assert {with .95 confidence) that the population mean rating would lie within two standard errors of the sample mean rating reported in this assessment. No attempt has been made to establish a composite ranking of pro- grams in a discipline. Indeed, the committee is convinced that no single measure adequately reflects the quality of a research-doctorate program and wishes to emphasize the importance of viewing individual programs from the perspective of multiple indices or dimensions. The second table in each chapter presents summary statistics {i.e., number- of programs evaluated, mean, standard deviation, and decile values) for each of the program measures. 2 9 The reader should find these statistics helpful in interpreting the data reported on individ- ual programs. Next is a table of the intercorrelations among the vari- ous measures for that discipline. This table should be of particular interest to those desiring information about the interrelations of the var. iOUS measures. The remainder of each chapter is devoted to an examination of re- sults from the reputational survey. Included are an analysis of the characteristics of survey participants and graphical portrayals of the relationship of the mean rating of scholarly quality of faculty (mea- sure 08) with the number of faculty (measure 01) and the relationship 28The standard error estimate has been computed by dividing the stan- dard deviation of a program's ratings by the square root of the number of ratings. For a more extensive discussion of this topic, see Fred N. Kerlinger, Foundations of Behavioral Research, Holt, Reinhart, and Winston, Inc., New York, 1973, Chapter 12. Readers should note that the estimate is a measure of the variation in response and by no means includes all possible sources of error. 2 9 Standardized scores have been computed from precise values of the mean and standard deviation of each measure and not the rounded values reported in the second table of a chapter.

33 of the mean rating of program effectiveness (measure 09) with the num- ber of graduates (measure 02~. A frequently mentioned criticism of the Roose-Andersen and Cartter studies is that small but distinguished pro- grams have been penalized in the reputational ratings because they are not as highly visible as larger programs of comparable quality. The comparisons of survey ratings with measures of program size are pre- sented as the first two figures in each chapter and provide evidence about the number of small programs in each discipline that have re- ceived high reputational ratings. Since in each case the reputational rating is more highly correlated with the square root of program size than with the size measure itself, measures 01 and 02 are plotted on a square root scale. 3 ° To assist the reader in interpreting results of the survey evaluations, each chapter concludes with a graphical presentation of the mean rating for every program of the scholarly quality of faculty (measure 08) and an associated "confidence interval" of 1.5 standard errors. In comparing the mean ratings of two programs, if their reported confidence intervals of 1.5 standard errors do not overlap, one may safely conclude that the program ratings are signifi- cantly different (at the .05 level of significance)--i.e., the observed difference in mean ratings is too large to be plausibly attributable to sampling error.3~ The final chapter of this report gives an overview of the evalua- tion process in the six biological science disciplines and includes a summary of general findings. Particular attention is given to some of the extraneous factors that may influence program ratings of individual evaluators and thereby distort the survey results. The chapter con- cludes with- a number of specific suggestions for improving future as- sessments of research-doctorate programs. 3 ° For a general discussion of transforming variables to achieve lin- ear fits, see John W. Tukey, Exploring Data Analysis, Addison-Wesley, Reading, Massachusetts, 1977. 3 tThis rule for comparing nonover~apping intervals is valid as long as the ratio of the two estimated standard errors does not exceed 2.41. (The exact statistical significance of this criterion then lies between .050 and .034.) Inspection of the standard errors reported in each discipline shows that for programs with mean ratings differing by less than 1.0 (on measure 08), the standard error of one mean very rarely exceeds twice the standard error of another.

Next: III. Biochemistry Programs »

An Assessment of Research-Doctorate Programs in the United States: Biological Sciences (1982)

Chapter: II. Methodology

Welcome to OpenBook!

Get Email Updates