Read "Assessing Research-Doctorate Programs: A Methodology Study" at NAP.edu

« Previous: 1. Introduction

Page 15 Cite

Suggested Citation:"2. How the Study Was Conducted." National Research Council. 2003. Assessing Research-Doctorate Programs: A Methodology Study. Washington, DC: The National Academies Press. doi: 10.17226/10859.

Page 16 Cite

Page 17 Cite

Page 18 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

2 How the Study Was Conducted LAYING THE GROUNDWORK In many ways, the completion of the 1995 Study led immediately into the study of the methodology for the next one. In the period between October of 1995, when the 1995 assessment was released, and 1999, when a planning meeting for the current study was held, Change magazine published an issue containing two articles on the NRC rankings one by Webster and Skinner (1996) and another by Ehrenberg and Hurst ~ 19961. In 1997, Hugh Graham and Nancy Diamond argued in their book, The Rise of American Research Universities, that standard methods of assessing institutional performance, including the NRC assessments, obscured the dynamics of institutional improvement because of the importance of size in determining reputation. In the June 1999 Chronicle of Higher Education, the criticism was expanded to include questioning the ability of raters to perform their task in a scholarly world that is increasingly specialized and often interdisciplinary. They recommended that in its next study the NRC should list ratings of programs alphabetically and give key quantitative indicators equal prominence alongside the reputational indicators. The taxonomy of the study was also immediately contro- versial. The study itself mentioned the difficulty of defining fields for the biological sciences and the problems that some institutions had with the final taxonomy. The 1995 tax- onomy left out research programs in schools of agriculture altogether. The coverage of programs in the basic bio- medical sciences that were housed in medical schools was also spotty. A planning meeting to consider a separate study for the agricultural sciences was heldin 1996, but when fund- ing could not be found, it was decided to wait until the next large assessment to include these fields. iGraham and Diamond (l999:B6). 15 Analytical studies were also conducted by a number of scholars to examine the relationship between quantitative and qualitative reputational measures.2 These studies found a strong statistical correlation between the reputational mea- sures of scholarly quality of faculty and many of the quanti- tative measures for all the selected programs. The Planning Meeting for the next study was held in June of 1999. Its agenda and participants are shown in Appendix C. As part of the background for that meeting, all the institu- tions that participated in the 1995 Study were invited to com- ment and suggest ways to improve the NRC assessment. There was general agreement among meeting participants and institutional commentators that a statement of purpose was needed for the next study that would identify both the intended users and the uses of the study. Other suggested changes were to: · Attack the question of identifying interdisciplinary and emerging fields and revisit the taxonomy for the biological sciences, · Make an effort to measure educational process and out- comes directly, · Recognize that the mission of many programs went beyond training Ph.D.s to take up academic positions, · Provide quantitative measures that recognize differ- ences by field in measures of merit, · Analyze how program size influences reputation, · Emphasize a rating scheme rather than numerical rankings, and · Validate the collected data. In the summer following the Planning Meeting, the presi- dents of the Conference Board of Associated Research Coun- 2Two examples of these studies were: Ehrenberg and Hurst (1998) and Junn and Brooks (2000).

16 oils and the presidents of three organizations, representing graduate schools and research universities,3 met and dis- cussed whether another assessment of research-doctorate programs should be conducted. Objections to doing a study arose from the view that graduate education was a highly complex enterprise and that rankings could only over- simplify that complexity; however, there was general agree- ment that, if the study were to be conducted again, a careful examination of the methodology should be undertaken first. The following statement of purpose for an assessment study was drafted: The purpose of an assessment is to provide common data, collected under common definitions, which permit compari- sons among doctoral programs. Such comparisons assist funders and university administrators in program evaluation and are useful to students in graduate program selection. They also provide evidence to external constituencies that graduate programs value excellence and assist in efforts to assess it. More fundamentally, the study provides an oppor- tunity to document how doctoral education has changed but how important it remains to our society and economy. The next 2 years were spent discussing the value of the methodology study with potential funders and refining its aims through interactions with foundations, university administrators and faculty, and government agencies. A list of those consulted is provided in Appendix B. A tele- conference about statistical issues was held in September 2000,4 and it concluded with a recommendation that the next assessment study include careful work on the analytic issues that had not been addressed in the 1995 Study. These issues included: · Investigating ways of data presentation that would not overemphasize small differences in average ratings. · Gaining better understanding of the correlates of reputation. · Exploring the effect of providing additional informa- tion to raters. · Increasing the amount of quantitative data included in the study so as to make it more useful to researchers. 3These were: John D'Arms, president, American Council of Learned Societies; Stanley Ikenberry, president, American Council on Education; Craig Calhoun, president, Social Science Research Council; and William Wulf, vice-president, National Research Council. They were joined by: Jules LaPidus, president, Council of Graduate Schools; Nils Hasselmo, president, Association of American Universities; arid Peter McGrath, presi- dent, National Association of State Universities and Larld Grant Colleges. 4Participants were: Jonathan Cole, Columbia University; Steven Fienberg, Carnegie-Mellon University; Jane Junn, Rutgers University; Donald Rubin, Harvard University; Robert Solow, Massachusetts Institute of Technology; Rachelle Brooks and John Vaughn, Association of American Universities; Harnet Zuckerman, Mellon Foundation; and NRC staff. ASSESSING RESEARCH-DOCTORATE PROGRAMS A useful study had been prepared for the 2000 tele- conference by Jane Junn and Rachelle Brooks, who were assisting the Association of American Universities' (AAU) project on Assessing Quality of University Education and Research. The study analyzed a number of quantitative measures related to reputational measures. Junn and Brooks made recommendations for methodological explorations in the next NRC study with suggestions for secondary analysis of data from the 1995 Study, including the following: · Faculty should be asked about a smaller number of programs (less than 50~. · Respondents should rate departments 1) in the area or subfield they consider to be their own specialization and then 2) separately for that department as a whole. · The study should consider using an electronic method of administration rather than a paper-and-pencil survey.5 Another useful critique was provided in a position paper for the National Association of State Universities and Land Grant Colleges by Joan Lorden and Lawrence Martin6 that resulted from the summer 1999 meeting of the Council on Research Policy and Graduate Education. This paper recommended that: · Rating be emphasized, not reputational ranking, · Broad categories be used in ratings, · Per capita measures of faculty productivity be given more prominence and that the number of measures be expanded, · Educational effectiveness be measured directly by data on the placement of program graduates and a "graduate's own assessment of their educational experiences five years out." THE STUDY ITSELF The Committee to Examine the Methodology for the Assessment of Research-Doctorate Programs of the NRC held its first meeting in April 2002. Chaired by Professor Jeremiah Ostriker, the Committee decided to conduct its work by forming four panels whose membership would con- sist of both committee members and nonmembers who could supplement the committee's expertise.7 The panels were comprised of both committee members and outside experts and their tasks were the following: 50p. Cit., p. 5. 6Lorden and Martin (n.d.). 7Committee and Pane] membership is shown in Appendix A.

HOW THE STUDY WAS CONDUCTED Panel on Taxonomy and Interclisciplinarity This panel was given the task of examining the taxonomies that have been used in past studies, identifying fields that should be incorporated into the study, and determining ways to describe programs across the spectrum of academic institu- tions. It attempted to incorporate interdisciplinary programs and emerging fields into the study. Its specific tasks were to: · Develop criteria to include/exclude fields. · Determine ways to recognize subfields within major fields. · Identify faculty associated with a program. · Determine issues that are specific to broad fields: agri- cultural sciences; biological sciences; arts and humanities; social and behavioral sciences; physical sciences, mathe- matics, and engineering. · Identify interdisciplinary fields. · Identify emerging fields and determine how much information should be included. · Decide on how fields with a small number of degrees and programs could be aggregated. Panel on the Review of Quantitative Measures The task of this panel was to identify measures of scholarly productivity, educational environment, and char- acteristics of students and faculty. In addition, it explored effective methods for data collection. The following issues were also addressed: · Identification of scholarly productivity measures using publication and citation data, and the fields for which the measures are appropriate. · Identification of measures that relate scholarly produc- tivity to research funding data, and the investigation of sources for these data. · Appropriate use of data on fellowships, awards, and honors. · Appropriate measures of research infrastructure, such as space, library facilities, and computing facilities. · Collection and uses of demographic data on faculty and students. · Characteristics of the graduate educational environ- ment, such as graduate student support, completion rates, time to degree, and attrition. · Measures of scholarly productivity in the arts and humanities. · Other quantitative measures and new data sources. Panel on Student Processes and Outcomes This panel investigated possible measures of student out- comes and the environment of graduate education. Ques- tions addressed were: 17 · What quantitative data can be collected or are already available on student outcomes? · What cohorts should be surveyed for information on student outcomes? · What kinds of qualitative data can be collected from students currently in doctoral programs? · Can currently used surveys on educational process and environment be adapted to this study? · What privacy issues might affect data gathering? Could institutions legally provide information on recent graduates? · How should a sample population for a survey be identified? · What measures might be developed to characterize participation in postdoctoral research programs? Panel on Reputational Measures and Data Presentation This panel focused on: · A critique of the method for measuring reputation used in the past study. · An examination of alternative ways for measuring scholarly reputation. · The type of preliminary data that should be collected from institutions and programs that would be the most help- ful for linking with other data sources (e.g., citation data) in the compilation of the quantitative measures. · The possible incorporation of industrial, governmental, and international respondents into a reputational assessment measure. In the process of its investigation the panel was to address issues such as: · The halo effect. · The advantage of large programs and the more promi- nent use of per capita measures. · The extent of rater knowledge about programs. · Alternative ways to obtain reputational measures. · Accounting for institutional mission. All panels met twice. At their first meetings, they addressed their charge and developed tentative recommendations for consideration by the full committee. Following committee discussion, the recommendations were revised. The Panel on Quantitative Measures and the Panel on Student Processes and Outcomes developed questionnaires that were fielded in pilot trials. The Panel on Reputational Measures and Data Presentation developed new statistical techniques for presenting data and made suggestions to conduct matrix sampling on reputational measures, in which different raters would receive different amounts of information about the programs they were rating. The Panel on Taxonomy devel- oped a list of fields and subfields and reviewed input from scholarly societies and from those who responded to several versions of a draft taxonomy that were posted on the Web.

18 TABLE 2-1 Characteristics for Selected Universities. ASSESSING RESEARCH-DOCTORATE PROGRAMS Univ. of Florida Michigan Univ. of Rensselaer Univ of Southern State Yale Univ. of State Wisconsin- Polytechnic California- California Univ. Univ. Maryland Univ. Milwaukee Institute San Francisco Location Los Angeles, Tallahassee, New Haven, College Park, East Lansing, Milwaukee, Troy, San Francisco, CA FL CT MD MI WI NY CA Year of 1880 1851 1701 1856 1855 1885 1824 1873 Foundation Graduate 9,088 6,383 n/a 9,061 7,752 4,099 2,003 2,578 Enrollment (1998 99) (Fall 2001) (Fall 2001) (Fall 2001) (2000) (Year) Number of Schools 18 17 10 13 15 11 5 6 Doctoral 71 72 73 68 79 17 25 16 Degree Programs Total Ph.D.s 411 261 325 460 429 77 92 81 (Year: 2000) Total 265 112 216 319 278 43 83 64 S&E Ph.D.s (Year: 2000) Number of 2,398 1,015 3,125 3,069 1,988 773 357 n/a Graduate Faculty* Type of Private Land Grant Private Land Grant Land Grant Small Private State Institution (Ivy League) (local) *Source: Peterson's Graduate & Professional Programs: An Overview, 1999, 33r~ edition, Princeton, NJ. NOTE: In the actual study, these data would be provided and verified by the institutions themselves. Pilot Testing Eight institutions volunteered to serve as pilot sites for experimental data collection. Since the purpose of the pilot trials was to test the feasibility of obtaining answers to draft questionnaires, the pilot sites were chosen to be as different as possible with respect to size, control, regional location, and whether they were specialized in particular areas of study (engineering in the case of RPI, biosciences in the case of UCSF). The sites and their major characteristics are shown in Table 2-1. Coordinators at the pilot sites then worked with their offices of institutional research and their department chairs to review the questionnaires and provide feedback to the NRC staff, who, in turn, revised the questionnaires. The pilot sites then administered theme Two of the pilot sites, Yale University and University of California-San Francisco, provided feedback on the questionnaires but did not participate in their actual administration. Questionnaires for faculty and students were placed on the Web. Respondents were contacted by e-mail and pro- vided individual passwords in order to access their question- naires. Institutional and program questionnaires were also available on the Web. Answers to the questionnaires were immediately downloaded into a database. Although there were glitches in the process (e.g., we learned that whenever the e-mail subject line was blank, our messages were discarded as spam), generally speaking, it worked well. Web-administered questionnaires could work, but special follow-up attentions is critical to ensure adequate response rates (over 70 percent). Data and observations from the pilot sites were shared with the committee and used to inform its recommendations, which are reported in the following four chapters. Relevant findings from the pilot trials are reported in the appropriate chapters. 9In the proposed study, the names of non-respondents will be sent to the graduate dean, who will assist the NRC in encouraging responses. Time needs to be allowed for such efforts.

Next: 3. Taxonomy »

Assessing Research-Doctorate Programs: A Methodology Study (2003)

Chapter: 2. How the Study Was Conducted

Welcome to OpenBook!

Get Email Updates