Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
8 Looking Ahead The charge to the committee for this study was the following: An assessment of the quality and characteristics of research-doctorate programs in the United States will be conducted. The study will consist of (1) the collection of quantitative data through questionnaires administered to institutions, programs, faculty, and admitted to candidacy students (in selected fields), (2) collection of program data on publications, citations, and dissertation keywords, and (3) the design and construction of program ratings using the collected data including quantitatively based estimates of program quality. These data will be released through a Web-based, periodically updatable database and accompanied by an analytic summary report. Following this release, further analyses will be conducted by the committee and other researchers and discussed at a workshop focusing on doctoral education in the United States. The methodology for the study will be a refinement of that described by the Committee to Examine the Methodology for the Assessment of Research-Doctorate Programs, which recommended that a new assessment be conducted. This study has completed the tasks specified in the charge, but they proved far more difficult and, as a result, took much more time than the committee initially anticipated. In this concluding chapter the committee looks at a few lessons learned from the conduct of the study and at other areas it has not fully explored and encourages researchers to use the study data to go farther. LESSONS LEARNED While conducting this study, and creating an unparalleled database on doctoral programs in 2005-2006, the committee learned many lessons about the data-based approach to describing doctoral education in the United States. These lessons are in the areas of taxonomy and multidisciplinarity, measurement, and the data-based construction of measures of perceived quality. In addition, the committee has areas that would be of great interestâsuch as the dimensional measures and the relation between postdoctoral scholars and doctoral studyâ that it did not have the time to investigate and on which it recommends further work. Taxonomy and Multidisciplinarity Although most doctoral work is still organized in disciplines, scholarly work in doctoral programs increasingly crosses disciplinary boundaries in both content and methods. The committee tried to identify measures of multi- and interdisciplinarity, but it believes it did not address the issue in the depth deserved, nor did the committee discover the kind of relation, if any, between multidisciplinarity and the perceived quality of doctoral programs. It therefore 105
106 A DATA-BASED ASSESSMENT OF RESEARCH-DOCTORATE PROGRAM S IN THE U.S. recommends that greater attention be paid to the relationship between multidisciplinarity and program quality the next time this study is undertaken. Measurement The validation of program data was a time-consuming process. The committee hopes that, based on the collection of data for this study, programs will better understand and have an easier time providing data for a future study. In particular, there should be greater clarity about what is meant by core faculty for a doctoral program and associated faculty. This distinction was made to prevent overcounting the productivity of faculty who are involved with multiple programs. In any case, techniques to check data are now in place and it is essential that they be further developed before the next survey is initiated and that instructions to the data providers be clear. Such steps could shorten the data validation process substantially. Data-Based Construction of Measures of Perceived Quality Ranking Programs Initially the committee was deeply divided on whether an effort to rank programs should be undertaken at all. However, there was universal agreement within the committee that efforts that relied entirely on reputation or on single measures of scholarly productivity could be misleading to potential applicants and others. The quality of reputational measures depends critically on who is asked and how knowledgeable they are about scholarship in a discipline. Thus the committee focused on doctoral program faculty, who are presumably engaged both in scholarship and in hiring decisions that involve judgments of the scholarly quality of programs other than their own. The committee surveyed these faculty members about the factors they thought were important, ideally, to the quality of a doctoral program. A sample of them was then asked to rate actual programs, as described in Chapter 4. This âanchoringâ rating study was a compromise. The committee sampled programs to ensure that a broad range of programs was included in the rating sample, but raters were more informed about program characteristics than in the 1995 study. The committee did not compare rating results from the two studies because the methodological differences were too great and the committee could not justify using the 1995 study as a benchmark. The committee also wanted to convey the degree of uncertainty in rankings. Very early in the study process the committee agreed that presentation of ranges of rankings would best convey the uncertainty inherent in any ranking study. It felt that a technique that combined the regression results with the survey results would give a more accurate estimate of program quality. The anchoring study, however, was based on relatively small samples of programs, and the committee found that the estimates of the ranges of rankings based on regression (R rankings) and general survey (S rankings) were not well correlated for some programs in some fields. This finding applied especially to fields with relatively few programs or to programs within a taxonomy that encompassed a diversity of scholarly practices. In any attempt to determine the values of faculty members as they relate to program ratings (R rankings), it is extremely important that the questions be tested for clarity and that the sample sizes be large enough to minimize statistical error. Thus, although the methodology for combining the
LOOKING AHEAD 107 coefficients, which lessens the weight on the coefficients with a larger standard error, could lead to better estimates of program quality in most cases, the committee agreed it would be better to show the regression-based and general survey-based results separately as additional information is conveyed. Further work that focuses on differences in the Râs and Sâs and the circumstances under which coefficients could be validly combined would be helpful. TWO AREAS FOR FURTHER STUDY Dimensional Measures As described in Chapters 3 and 4, the committeeâs reliance on faculty views of program quality and its determinants resulted in some variables that the committee strongly believed were important to doctoral program quality showing up with very low weights in the overall rankings. Perhaps scholarly activity is clearly of paramount importance to most faculty members and thus to the quality of doctoral programs that produce future faculty members. And yet additional aspects of the doctoral experience and environment may prove important to students, many of whom will not take academic positions, and to the faculty who prepare them. The committee took this factor into account in its data-based ranking methodology by constructing dimensional measures that maintained the relative weighting of the included characteristics, but only included the characteristics relevant to a particular dimension of doctoral education. A look at these measures reveals that many of the programs that rank high on the research dimension may not rank as well on the student support and outcomes dimension, or on the diversity dimension. Such an outcome might be expected because the committee was trying to capture separate aspects of doctoral education, but saw no reason why a program could not rank highly on all three dimensional measures. However, in general this failed to be the case. In the future a larger student survey and an effort to incorporate student values could enhance the study findings. The Connection Between Postdoctoral Study and Doctoral Education The connection between postdoctoral study and doctoral education was not explored in any depth in this study, although the committee did collect data about the number of postdoctoral scholars associated with each program. Especially in the biosciences, postdocs are part of a continuum of research training. Whether the characteristics of doctoral programs with many postdocs differ greatly from those with few should be studied. The difference may be in the nature of research being undertaken, or it may be that the nature of the doctoral education experience differs, depending on the number of postdocs associated with a doctoral program. CONCLUSION This study developed a methodology based on relating data about doctoral programs to the reputational ratings of particular programs and also to idealized preferences about program characteristics. For many fields it found that the separate approaches resulted in different characteristics appearing as important as determinants of rankings, depending on the measure. Program size was very important for the R, or regression-based, approach, and various measures of research activity were very important for the S, or survey, approach. If there is an overall lesson to be learned, it is that people who use rankings should be cautious before relying on
108 A DATA-BASED ASSESSMENT OF RESEARCH-DOCTORATE PROGRAM S IN THE U.S. them. The production of rankings from quantitative measures of program characteristics turned out to be more complicated and to be associated with greater uncertainty than originally thought. Any set of evaluations rests on the core values given to program characteristics. In many other efforts of this type, the investigators have not been explicit about the basis for the values adopted. Users of this and other studies need to understand what goes into themâassumptions, weights, surveys, and uncertainty. In this study, if users relied on ranges of rankings alone, they would find a few programs at the top and the bottom with a narrow range of rankings. Most programs have a wide range of rankings and fall somewhere in the middle. This finding struck the committee as corresponding well to the way the world really is. Users need to go beyond rankings and examine the characteristics that are important for their purposes and concerns.