| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 15
2
How the Study Was Conducted
LAYING THE GROUNDWORK
In many ways, the completion of the 1995 Study led
immediately into the study of the methodology for the next
one. In the period between October of 1995, when the 1995
assessment was released, and 1999, when a planning meeting
for the current study was held, Change magazine published
an issue containing two articles on the NRC rankings one
by Webster and Skinner (1996) and another by Ehrenberg
and Hurst ~ 19961. In 1997, Hugh Graham and Nancy
Diamond argued in their book, The Rise of American
Research Universities, that standard methods of assessing
institutional performance, including the NRC assessments,
obscured the dynamics of institutional improvement because
of the importance of size in determining reputation. In the
June 1999 Chronicle of Higher Education, the criticism was
expanded to include questioning the ability of raters to
perform their task in a scholarly world that is increasingly
specialized and often interdisciplinary. They recommended
that in its next study the NRC should list ratings of programs
alphabetically and give key quantitative indicators equal
prominence alongside the reputational indicators.
The taxonomy of the study was also immediately contro-
versial. The study itself mentioned the difficulty of defining
fields for the biological sciences and the problems that some
institutions had with the final taxonomy. The 1995 tax-
onomy left out research programs in schools of agriculture
altogether. The coverage of programs in the basic bio-
medical sciences that were housed in medical schools was
also spotty. A planning meeting to consider a separate study
for the agricultural sciences was heldin 1996, but when fund-
ing could not be found, it was decided to wait until the next
large assessment to include these fields.
iGraham and Diamond (l999:B6).
15
Analytical studies were also conducted by a number of
scholars to examine the relationship between quantitative
and qualitative reputational measures.2 These studies found
a strong statistical correlation between the reputational mea-
sures of scholarly quality of faculty and many of the quanti-
tative measures for all the selected programs.
The Planning Meeting for the next study was held in June
of 1999. Its agenda and participants are shown in Appendix C.
As part of the background for that meeting, all the institu-
tions that participated in the 1995 Study were invited to com-
ment and suggest ways to improve the NRC assessment.
There was general agreement among meeting participants
and institutional commentators that a statement of purpose
was needed for the next study that would identify both the
intended users and the uses of the study. Other suggested
changes were to:
· Attack the question of identifying interdisciplinary and
emerging fields and revisit the taxonomy for the biological
sciences,
· Make an effort to measure educational process and out-
comes directly,
· Recognize that the mission of many programs went
beyond training Ph.D.s to take up academic positions,
· Provide quantitative measures that recognize differ-
ences by field in measures of merit,
· Analyze how program size influences reputation,
· Emphasize a rating scheme rather than numerical
rankings, and
· Validate the collected data.
In the summer following the Planning Meeting, the presi-
dents of the Conference Board of Associated Research Coun-
2Two examples of these studies were: Ehrenberg and Hurst (1998) and
Junn and Brooks (2000).
OCR for page 16
16
oils and the presidents of three organizations, representing
graduate schools and research universities,3 met and dis-
cussed whether another assessment of research-doctorate
programs should be conducted. Objections to doing a study
arose from the view that graduate education was a highly
complex enterprise and that rankings could only over-
simplify that complexity; however, there was general agree-
ment that, if the study were to be conducted again, a careful
examination of the methodology should be undertaken first.
The following statement of purpose for an assessment study
was drafted:
The purpose of an assessment is to provide common data,
collected under common definitions, which permit compari-
sons among doctoral programs. Such comparisons assist
funders and university administrators in program evaluation
and are useful to students in graduate program selection.
They also provide evidence to external constituencies that
graduate programs value excellence and assist in efforts to
assess it. More fundamentally, the study provides an oppor-
tunity to document how doctoral education has changed but
how important it remains to our society and economy.
The next 2 years were spent discussing the value of the
methodology study with potential funders and refining its
aims through interactions with foundations, university
administrators and faculty, and government agencies. A list
of those consulted is provided in Appendix B. A tele-
conference about statistical issues was held in September
2000,4 and it concluded with a recommendation that the next
assessment study include careful work on the analytic issues
that had not been addressed in the 1995 Study. These issues
included:
· Investigating ways of data presentation that would not
overemphasize small differences in average ratings.
· Gaining better understanding of the correlates of
reputation.
· Exploring the effect of providing additional informa-
tion to raters.
· Increasing the amount of quantitative data included in
the study so as to make it more useful to researchers.
3These were: John D'Arms, president, American Council of Learned
Societies; Stanley Ikenberry, president, American Council on Education;
Craig Calhoun, president, Social Science Research Council; and William
Wulf, vice-president, National Research Council. They were joined by:
Jules LaPidus, president, Council of Graduate Schools; Nils Hasselmo,
president, Association of American Universities; arid Peter McGrath, presi-
dent, National Association of State Universities and Larld Grant Colleges.
4Participants were: Jonathan Cole, Columbia University; Steven
Fienberg, Carnegie-Mellon University; Jane Junn, Rutgers University;
Donald Rubin, Harvard University; Robert Solow, Massachusetts Institute
of Technology; Rachelle Brooks and John Vaughn, Association of
American Universities; Harnet Zuckerman, Mellon Foundation; and NRC
staff.
ASSESSING RESEARCH-DOCTORATE PROGRAMS
A useful study had been prepared for the 2000 tele-
conference by Jane Junn and Rachelle Brooks, who were
assisting the Association of American Universities' (AAU)
project on Assessing Quality of University Education and
Research. The study analyzed a number of quantitative
measures related to reputational measures. Junn and Brooks
made recommendations for methodological explorations in
the next NRC study with suggestions for secondary analysis
of data from the 1995 Study, including the following:
· Faculty should be asked about a smaller number of
programs (less than 50~.
· Respondents should rate departments 1) in the area or
subfield they consider to be their own specialization and then
2) separately for that department as a whole.
· The study should consider using an electronic method
of administration rather than a paper-and-pencil survey.5
Another useful critique was provided in a position paper
for the National Association of State Universities and Land
Grant Colleges by Joan Lorden and Lawrence Martin6 that
resulted from the summer 1999 meeting of the Council on
Research Policy and Graduate Education. This paper
recommended that:
· Rating be emphasized, not reputational ranking,
· Broad categories be used in ratings,
· Per capita measures of faculty productivity be given
more prominence and that the number of measures be
expanded,
· Educational effectiveness be measured directly by data
on the placement of program graduates and a "graduate's
own assessment of their educational experiences five years
out."
THE STUDY ITSELF
The Committee to Examine the Methodology for the
Assessment of Research-Doctorate Programs of the NRC
held its first meeting in April 2002. Chaired by Professor
Jeremiah Ostriker, the Committee decided to conduct its
work by forming four panels whose membership would con-
sist of both committee members and nonmembers who could
supplement the committee's expertise.7 The panels were
comprised of both committee members and outside experts
and their tasks were the following:
50p. Cit., p. 5.
6Lorden and Martin (n.d.).
7Committee and Pane] membership is shown in Appendix A.
OCR for page 17
HOW THE STUDY WAS CONDUCTED
Panel on Taxonomy and Interclisciplinarity
This panel was given the task of examining the taxonomies
that have been used in past studies, identifying fields that
should be incorporated into the study, and determining ways to
describe programs across the spectrum of academic institu-
tions. It attempted to incorporate interdisciplinary programs
and emerging fields into the study. Its specific tasks were to:
· Develop criteria to include/exclude fields.
· Determine ways to recognize subfields within major
fields.
· Identify faculty associated with a program.
· Determine issues that are specific to broad fields: agri-
cultural sciences; biological sciences; arts and humanities;
social and behavioral sciences; physical sciences, mathe-
matics, and engineering.
· Identify interdisciplinary fields.
· Identify emerging fields and determine how much
information should be included.
· Decide on how fields with a small number of degrees
and programs could be aggregated.
Panel on the Review of Quantitative Measures
The task of this panel was to identify measures of
scholarly productivity, educational environment, and char-
acteristics of students and faculty. In addition, it explored
effective methods for data collection. The following issues
were also addressed:
· Identification of scholarly productivity measures using
publication and citation data, and the fields for which the
measures are appropriate.
· Identification of measures that relate scholarly produc-
tivity to research funding data, and the investigation of
sources for these data.
· Appropriate use of data on fellowships, awards, and
honors.
· Appropriate measures of research infrastructure, such
as space, library facilities, and computing facilities.
· Collection and uses of demographic data on faculty and
students.
· Characteristics of the graduate educational environ-
ment, such as graduate student support, completion rates,
time to degree, and attrition.
· Measures of scholarly productivity in the arts and
humanities.
· Other quantitative measures and new data sources.
Panel on Student Processes and Outcomes
This panel investigated possible measures of student out-
comes and the environment of graduate education. Ques-
tions addressed were:
17
· What quantitative data can be collected or are already
available on student outcomes?
· What cohorts should be surveyed for information on
student outcomes?
· What kinds of qualitative data can be collected from
students currently in doctoral programs?
· Can currently used surveys on educational process and
environment be adapted to this study?
· What privacy issues might affect data gathering? Could
institutions legally provide information on recent graduates?
· How should a sample population for a survey be
identified?
· What measures might be developed to characterize
participation in postdoctoral research programs?
Panel on Reputational Measures and Data Presentation
This panel focused on:
· A critique of the method for measuring reputation used
in the past study.
· An examination of alternative ways for measuring
scholarly reputation.
· The type of preliminary data that should be collected
from institutions and programs that would be the most help-
ful for linking with other data sources (e.g., citation data) in
the compilation of the quantitative measures.
· The possible incorporation of industrial, governmental,
and international respondents into a reputational assessment
measure.
In the process of its investigation the panel was to address
issues such as:
· The halo effect.
· The advantage of large programs and the more promi-
nent use of per capita measures.
· The extent of rater knowledge about programs.
· Alternative ways to obtain reputational measures.
· Accounting for institutional mission.
All panels met twice. At their first meetings, they addressed
their charge and developed tentative recommendations for
consideration by the full committee. Following committee
discussion, the recommendations were revised. The Panel
on Quantitative Measures and the Panel on Student Processes
and Outcomes developed questionnaires that were fielded in
pilot trials. The Panel on Reputational Measures and Data
Presentation developed new statistical techniques for
presenting data and made suggestions to conduct matrix
sampling on reputational measures, in which different raters
would receive different amounts of information about the
programs they were rating. The Panel on Taxonomy devel-
oped a list of fields and subfields and reviewed input from
scholarly societies and from those who responded to several
versions of a draft taxonomy that were posted on the Web.
OCR for page 18
18
TABLE 2-1 Characteristics for Selected Universities.
ASSESSING RESEARCH-DOCTORATE PROGRAMS
Univ. of Florida Michigan Univ. of Rensselaer Univ of
Southern State Yale Univ. of State Wisconsin- Polytechnic California-
California Univ. Univ. Maryland Univ. Milwaukee Institute San Francisco
Location Los Angeles, Tallahassee, New Haven, College Park, East Lansing, Milwaukee, Troy, San Francisco,
CA FL CT MD MI WI NY CA
Year of 1880 1851 1701 1856 1855 1885 1824 1873
Foundation
Graduate 9,088 6,383 n/a 9,061 7,752 4,099 2,003 2,578
Enrollment (1998 99) (Fall 2001) (Fall 2001) (Fall 2001) (2000)
(Year)
Number of
Schools
18
17 10 13
15 11 5 6
Doctoral 71 72 73 68 79 17 25 16
Degree
Programs
Total Ph.D.s 411 261 325 460 429 77 92 81
(Year: 2000)
Total 265 112 216 319 278 43 83 64
S&E Ph.D.s
(Year: 2000)
Number of 2,398 1,015 3,125 3,069 1,988 773 357 n/a
Graduate
Faculty*
Type of Private Land Grant Private Land Grant Land Grant Small Private State
Institution (Ivy League) (local)
*Source: Peterson's Graduate & Professional Programs: An Overview, 1999, 33r~ edition, Princeton, NJ.
NOTE: In the actual study, these data would be provided and verified by the institutions themselves.
Pilot Testing
Eight institutions volunteered to serve as pilot sites for
experimental data collection. Since the purpose of the pilot
trials was to test the feasibility of obtaining answers to draft
questionnaires, the pilot sites were chosen to be as different
as possible with respect to size, control, regional location,
and whether they were specialized in particular areas of study
(engineering in the case of RPI, biosciences in the case of
UCSF). The sites and their major characteristics are shown
in Table 2-1.
Coordinators at the pilot sites then worked with their
offices of institutional research and their department chairs
to review the questionnaires and provide feedback to the
NRC staff, who, in turn, revised the questionnaires. The
pilot sites then administered theme
Two of the pilot sites, Yale University and University of California-San
Francisco, provided feedback on the questionnaires but did not participate
in their actual administration.
Questionnaires for faculty and students were placed on
the Web. Respondents were contacted by e-mail and pro-
vided individual passwords in order to access their question-
naires. Institutional and program questionnaires were also
available on the Web. Answers to the questionnaires were
immediately downloaded into a database. Although there
were glitches in the process (e.g., we learned that whenever
the e-mail subject line was blank, our messages were
discarded as spam), generally speaking, it worked well.
Web-administered questionnaires could work, but special
follow-up attentions is critical to ensure adequate response
rates (over 70 percent).
Data and observations from the pilot sites were shared with
the committee and used to inform its recommendations, which
are reported in the following four chapters. Relevant findings
from the pilot trials are reported in the appropriate chapters.
9In the proposed study, the names of non-respondents will be sent to the
graduate dean, who will assist the NRC in encouraging responses. Time
needs to be allowed for such efforts.
Representative terms from entire chapter:
reputational measures