Analyzing Key Elements
On the surface, setting up an effective peer review system seems straightforward: contact the top experts in the field, have them review a set of proposals and provide their input, and fund research projects according to the advice provided. Experience has shown, however, that enacting this simple concept is complicated. How should expertise be defined, especially in the many areas of education research that are multidisciplinary? What kind of criteria should be used to judge the proposals, and how should they be quantified or summarized? How should the process be structured so it is seen as legitimate by a range of stakeholders? What is the best way to organize and support the group? What is the nature of the relationship between the peers and the agency staff and leadership, who typically make, and are ultimately accountable for, final funding decisions?
These are but a few of the multidimensional questions involved in designing, revamping, or evaluating peer review systems. This chapter provides an overview of some of the major components of peer review systems designed to assess education research proposals in federal agencies. We describe and analyze components of peer review processes with respect to how they promote particular objectives. We chose to consider aspects of peer review from this perspective not only because it serves as an effective organizational framework, but also because in our view research policy makers ought to approach their own systems in a similar manner. We conclude
with an examination of management issues that influence the extent to which such systems can produce desired results.
From our analysis, we draw six major conclusions:
Peer review serves a number of worthwhile purposes. For peer review systems for federally funded education research, two objectives important in their design are the identification and support of high-quality research and the further development of a culture of rigorous inquiry in the field.
Federal agencies that fund education research use a range of models for peer review that serve different purposes and objectives.
Developing peer review systems involves balancing multiple, and sometimes conflicting, values and thus often requires making trade-offs.
Peer review in the federal government is a tool by which agency goals are accomplished and therefore can only be developed, evaluated, and understood as framed by these objectives.
Although peer review is not perfect, it is the best available mechanism for identifying and supporting high-quality research.
Peer review of education research proposals in federal agencies could be improved in a number of ways.
MULTIPLE PURPOSES AND VALUES
In Chapter 1, we described the nature of peer review in federal government as one that serves both scientific and political ends.
In their paper (see http://www7.nationalacademies.org/core/HacketChubin_peer_review_paper.pdf) and presentation to the committee, Hackett and Chubin elaborate the many functions that peer review is called on to serve. At the most basic level, peer review is a mechanism for evaluating the merits of proposals for research funding, thereby influencing the distribution of federal research funds. But it also serves several additional and related functions.
For example, a major reason scientists participate in peer review—a time-consuming task in addition to existing professional obligations—is to have an impact on the field beyond their own investigations. Thus, peer review shapes the accumulation of knowledge over time by recommending a subset of proposed research for implementation. This idea was prominent
in workshop discussions. Both Hilda Borko, education professor at University of Colorado and president of the American Educational Research Association, and Penelope Peterson, dean, school of education and social policy and Eleanor R. Baldwin Professor of Education at Northwestern University, speaking on behalf of a group of education school deans,1 articulated peer review as a force that “shape[s] and envisions” the future of a field. Edward Hackett, sociology professor at Arizona State University highlighted the “communication function” of peer review and its role in “prepar[ing] the ground for the acceptance of new ideas.” Finbarr Sloane, of the Education and Human Resources (EHR) Directorate of the National Science Foundation (NSF), echoed these ideas, stating that “there is a huge return on investment for serving on a panel…. [Reviewers] get a sense … for what national questions other people are posing, and responses to those questions.” And Edward Redish—a physicist and physics education researcher at the University of Maryland—also pointed to the benefits for researchers who serve on peer review panels, citing the value he has experienced in “see[ing] what people were thinking about in the field.”
Delivering feedback to proposers can also signal the field’s (often implicit) standards of quality, reinforcing them in a formal context. Redish made this point about the purpose of peer review most directly, arguing that “peer review is not just about finding scientific merit in particular areas. It is about defining it and creating it.” This purpose is particularly salient in education, since current standards of evidence often vary by discipline and subfield. Redish’s point also underscores the fact that judging the scientific merit of a proposal for research is different from judging the merits of a research product. Research is by its nature an exercise in being alert to, and systematically dealing with, unexpected issues and questions that arise in the course of an investigation. Therefore, the nature and level of specificity of quality criteria are different when considering a description of how an investigator plans to approach the work than when considering the product of a completed investigation.
Peer review can also be used as a tool for building interdisciplinary
trust among groups of investigators from different research traditions—again, an important endeavor in an area like education, in which multiple fields and disciplines focus on various aspects of teaching, learning, and schooling. Kenneth Dodge, director of the Center for Child and Family Policy at Duke University, described how engaging in peer review helps draw disparate fields together to better reflect and understand the complexities of educational phenomena.
Another function of peer review is its role as a buffer, creating a privileged space for researchers to make judgments largely apart from political considerations (Hackett and Chubin, 2003). While political considerations drive funding levels and can impact statements of priority areas (National Research Council, 2002), peer review is used to remove decisions about the funding of individual projects from the influence of special interests or other political groups and agendas. Thus, the peer review process offers a space for researchers to apply scientific principles, debate and identify promising lines of inquiry, and offer crucial advice to decision makers that draws on their expertise to advance research-based knowledge.
Workshop discussions also highlighted the role of peer review as a tool for professional development—for proposers, reviewers, and agency staff—to promote a professional culture of inquiry and rigor among researchers. This culture includes an ethos steeped in self-reflection and integrity, as well as a commitment to working toward shared standards of practice (Shulman, 1999; National Research Council, 2002; Feuer, Towne, and Shavelson, 2002). Many workshop participants pointed to the broad “educative” function of peer review to mentor an incoming generation of scholars, to train investigators to review the scholarly quality of proposals, to produce higher quality proposals in the future, and to strengthen connections throughout the field of education research.
Although rarely explicit, peer review is often expected to meet these and many other purposes equally well. It is therefore not surprising that the process can come under fire for not serving any one of them fully. Designing peer review systems, improving existing ones, and assessing their effectiveness requires cognizance of these expectations and the implementation of process options accordingly.
In addition to serving multiple purposes, peer review systems are also designed to serve a set of values, like those of the agency and the fields it supports. These values are sometimes in tension, and they always require a careful balancing act in choosing a course of action. For example, peer review is expected to uphold the value of effectiveness—“to recommend
projects that would benefit the field and confer some greater social benefit, to offer advice to proposers, to circulate ideas within a community, and more. Peer review is also asked to be efficient, to do all of this at very low cost, with cost measured in terms of dollars spent on reviews (infrastructure, travel, reviewer compensation) and in hours expended by proposal writers and reviewers” (Hackett and Chubin, 2003, p. 15).
Another example of these value tensions is the trade-off between risk and tradition. Hackett and Chubin (2003) argue that this tension in peer review is a reflection of the tension in scientific communities more generally: research is expected to chart new progress, but to do so systematically and within the broad parameters set by existing knowledge and standards of rigor. During her presentation, Peterson argued that peer review systems ought to “create opportunities for risk-taking and innovative education research.”
Simultaneously maximizing efficiency and effectiveness and risk and tradition are just a few examples of the many kinds of values to be balanced—explicitly or implicitly—by peer review systems (see Hackett and Chubin, 2003, for a more complete treatment). The multiple purposes and competing values inherent to peer review, coupled with the complex nature of education and education research, are reflected in a high degree of variability in peer review systems among the many agencies that fund education research. Culture, tradition, and the mission of the agency also exert a powerful influence over the nature of peer review practices. Indeed, it is clear that no single model could suit all purposes and all situations and all fields equally well.
Whether a particular practice will work well depends in large part on the specifics of the situation and the purposes the system is intended to serve. To guide our analysis of peer review practices, we first articulate two broad purposes best served by peer review systems in federal agencies that support education research.
KEY OBJECTIVES OF PEER REVIEW FOR EDUCATION RESEARCH
Taking our cue from this discussion of multiple purposes, we conclude that two broad objectives that ought to guide the design of peer review systems in federal agencies: the identification and support of high-quality education research and the professional development of the field.
The first objective of using peer review as a process to achieve quality
research has been front and center in federal agencies that have funded education research for some time (although it is a matter of debate how well various agencies have done so in the past). We strongly endorse explicit attention to education research quality as well as redoubled efforts to strengthen peer review systems for this purpose. Rigorous studies of educational phenomena can provide important insights into policy and practice (and have—see National Research Council, 2002, for examples). But poor research is in many ways worse than no research at all, because it is wasteful and promotes flawed models for effective knowledge generation. Quality is of the essence, and having leaders in the field carefully scrutinizing and screening proposed work is one essential way to promote it.
Although what is meant by quality with respect to education research is a matter of some debate in the field, attending to the rigor and relevance of education research is essential to its health. Peer review systems in federal agencies offer a natural place to engage the field in the contested but crucial task of developing and applying high standards for evaluating the merits of proposed research. Strict rules are not advisable given the interdisciplinary nature of education and the prospective nature of research proposals. However, broad standards, consistently applied in peer review settings, are needed to ensure quality.
Moreover, the current enthusiasm for, and debates surrounding, calls for “scientifically based research” in education and references to the use of peer review provide opportunities for a stronger and more consistent focus on peer review as the means to promote research quality. By defining and upholding high standards of quality in the peer review process, researchers can exert a powerful influence on questions of what counts as high-quality research in particular contexts—providing input directly from the scholarly communities with respect to the implementation of policies stemming from the now numerous definitions of quality research that appear in federal education law (e.g., the No Child Left Behind Act of 2001, the Education Sciences Reform Act of 2002, and bills pending to reauthorize the Individuals with Disabilities Education Act of 1997 and parts of the Higher Education Act of 1965, P.L. 89-329). The insulation of peer review from the political process is important for facilitating this goal.
In our view, the second objective that should guide peer review in federal agencies that support education research is to contribute to the further development of a culture of inquiry in the field. Peer review has not historically been designed to promote such professional development in the federal agencies that support education research. We think it deserves
far more attention. As the authors of Scientific Research in Education (National Research Council, 2002) argue, we think it is a professional responsibility of education researchers to participate in peer review in federal agencies, and the field ought to harness this system to promote the development of the profession.
Federal education research policy makers also have major responsibility for organizing peer review in ways that foster growth among education researchers. If deliberately developed with this objective in mind, peer review systems can serve this purpose among the many players in the education research field. In the context of peer review, they can usefully be categorized as applicants (people who are seeking agency funds to initiate new work), reviewers (people who review the merits of the proposals for new work), and staff (people who work in the research agencies).
All three of these categories of people are members of the research community, operating in the broader public domain. In the ideal, peer review systems foster enriching interactions, and each group serves both a teaching and learning function to their own benefit and that of others. Chubin and Hackett (1990) argue that this dynamic can improve understanding among all members of the community, enhancing the capacity of the field as a whole.
For example, an applicant can communicate to reviewers cutting-edge ideas in an area of study, stimulating thinking among a broader set of researchers on potential new directions for a field or subfield. In much the same way, the feedback that reviewers provide to applicants often signals areas of contention about new ideas or techniques, preparing the ground for broader scrutiny and consideration of where and how to push the knowledge base and its application. Agency staff teach and learn as well: they familiarize reviewers with relevant agency priorities, goals, review criteria, process specifics, and the particular objectives held in a research competition for advancing the field. In the process of managing and participating in the process, the staff often gain a significant breadth of understanding and knowledge in a field by reading proposals and listening to reviewers’ dialogue about the status of the field and the quality of the batch of proposals under review across and within panels. In some cases, agency staff are themselves accomplished researchers who are serving in temporary posts in research agencies. Overall, knowledgeable staff sharpen internal thinking about how to shape and run future competitions.
Having described and justified our choice for the two objectives we
hold as most salient for shaping peer review of education research proposals, we now analyze several design features of peer review systems described at the workshop with respect to how likely they are to promote them. Other purposes, including those mentioned in this report, may be relevant to promote in particular contexts and at particular points in the evolution of a line of inquiry in education research. Our intent in setting forth these two objectives is to identify explicitly the purposes we see as most relevant for organizing peer review systems in federal agencies, as well as to provide a structure for analyzing various aspects of peer review systems. Since some peer review practices serve more than one purpose, there is some overlap in the discussion of peer review practices and considerations between the two main sections that follow. In some of these cases, we highlight the tensions that arise and the trade-offs that are often required in the attempt of peer review to serve multiple purposes.
IDENTIFYING AND SUPPORTING HIGH-QUALITY RESEARCH
The formal review of education research proposals by professional peers must be designed to identify and support high-quality research. There are many decisions and practices that undergird this critical function, most of which can be categorized into two areas: the people in the process—Who counts as a peer?—and the criteria by which quality is judged—How is research quality defined? Within each, we take up a set of peer review practices described at the workshop that relates to them most directly.
Who Reviews: Identifying Peers
Deciding who counts as a peer is the very crux of the matter: the peer review process, no matter how well designed, is only as good as the people involved. Judging the competence of peers in any research field is a complex task requiring assessment on a number of levels. In education research, it is particularly difficult because the field is so diverse (e.g., with respect to disciplinary training and background, epistemological orientation) and diffuse (e.g., housed in various university departments and research institutions, working on a wide range of education problems and issues). The workshop discussions brought out several related issues and illustrated the difficulties in, and disagreements associated with, assembling the right people for the job.
What are the required skills, experiences, and knowledge for peer reviewers to perform their duties? Workshop participants answered this question in a number of ways. In their presentation of the main findings from an evaluation of the peer review system at the former Office of Educational Research and Improvement (OERI) during the mid-1990s, Diane August, senior research scientist, Center for Applied Linguistics, and Penelope Peterson reported on an analysis of the fit between the expertise of reviewers and the competitions they reviewed for. Using the standards for peer reviewers that were in place at the time, they focused on the extent to which each reviewer had content, theory, and methodological expertise. They found a number of disconnects, including a relatively low level of fit on the methodological aspects of the research proposals under review (August and Muraskin, 1998).
Expertise is required in three main areas to identify high-quality education research in the review process: the content areas of the proposed work, the methods and analytic techniques proposed to address the research questions, and the practice and policy contexts in which the work is situated.
At one level, it is self-evident that reviewers need to know something about content to review education research proposals. But “education” is a term covering a vast territory of potential areas of study. Some competitions for research dollars are cast quite broadly (e.g., early childhood development), while others carve out a well-defined subtopic (e.g., effectiveness of pre-K curriculum on school readiness). Content expertise, then, is defined by the research priorities in the competition itself. Even in relatively circumscribed competitions, a wide range of content knowledge is typically required to adequately judge the merits of a set of proposals. Furthermore, the knowledge of content as it applies to teaching and learning that content is important. Referencing Shulman (1986), Borko made this point at the workshop, asserting that in order “to review proposals about mathematics teaching and learning, [reviewers] really do need to know about mathematics, and … teaching and learning. Pedagogical content knowledge is kind of the nexus of those aspects of knowledge.”
Another dimension of expertise necessary for peer review of education research proposals is knowledge of relevant methodological and analytic techniques. Like any profession, familiarity and facility with the tools of the trade are an essential part of the job. Reviewers must posses a solid grounding in methodological approaches best suited for studying the par-
ticular problems or topics reflected in the competition. Competent peer review of the quality of research must be conducted by groups of researchers who are together familiar with both general standards (like those outlined in Scientific Research in Education, National Research Council, 2002) and specific standards (relative to particular subfields) and who practice these standards in their own research studies (National Research Council, 1992; Chubin and Hackett, 1990; Cole, 1979).
Finally, reviewers must be grounded in the overarching practice and policy contexts associated with the area under consideration. This foundation is necessary to place the potential contribution of new work in the context of current issues and problems facing education policy makers and practitioners, as well as to consider the kinds of expertise that might be required to carry out the work effectively.
Do all reviewers need to have each kind of expertise to participate effectively? Most workshop participants agreed that not only was it nearly impossible to find people with such breadth and depth of experience and expertise, but also that it wasn’t necessary. Rather, we agree with most participants that it is the combined expertise of the group that matters. That is, constructing panels with appropriate expertise requires ensuring that the group as a whole reflects appropriate coverage. Hackett made this point most directly, arguing that it is the “distributed” expertise on a peer review panel that is relevant.
Beyond these three broad areas of competence that we view as essential for peer review panels, additional kinds of expertise relevant to the process surfaced in workshop discussions. For example, Robert Sternberg, director of the Yale Center for the Psychology of Abilities, Competencies and Expertise and the president of the American Psychological Association, suggested that creativity is an undervalued yet critical talent for assessing research quality.2
Teresa Levitin, director, Office of Extramural Affairs, speaking from her experience running panels at the National Institute on Drug Abuse at the National Institutes of Health (NIH), referred to a number of personal qualities that make for effective reviewers. Such people listen respectfully and are intellectually open to genres of research outside their realm of expertise. They neither dominate nor acquiesce during face-to-face delib-
erations about proposals under review. Although we deem these traits as secondary to the three dimensions of expertise we describe here, they are some of the intangibles that influence the success of the peer review process in a very real way and therefore must be considered in vetting reviewer candidates.
Conflicts of Interest and Bias
For peer review to be an effective tool for identifying and supporting high-quality research, it must be credible. Essential to the integrity and legitimacy of the process is ensuring that reviewers do not have a vested interest in the outcomes of the competition that could introduce criteria other than quality into the process. Thus, it is essential to vet potential reviewers for whether they would have a conflict of interest that would prevent them from fairly judging a proposal or set of proposals. At one level, it is the responsibility of agency staff to probe these potential problems. But it is also a critical part of an ethical code of conduct among investigators to be forthcoming about their relationships to the proposed work. As Levitin put it: “the integrity of the system really depends on the integrity of the individual reviewers.”
Conflicts of interest may arise in situations in which there is a possibility, or a perceived possibility, that a reviewer, or his or her associates, might gain from a decision about funding. Agencies deal with these issues in different ways. Steve Breckler, of the Social Behavioral and Economic Sciences directorate at the NSF, referenced a “complex array of conflict of interest rules” that applies to peer review of research proposals submitted to the NSF. Brent Stanfield, deputy director, NIH’s Center for Scientific Review, mentioned that applicants for funding from the NIH are encouraged to identify “competitors” who they feel would be too influenced by the outcome of the review to serve as fair reviewers, and that panelists with potential conflicts of interest on a particular proposal would recuse themselves from the discussion of its merits. Louis Danielson, director of the Research to Practice Division, Office of Special Education Programs (OSEP), described the interpretation of these and related rules by the U.S. Department of Education that preclude the participation of reviewers with particular affiliations.
A related but distinct idea that shapes the vetting of panelists is bias. Biases are preferences that may influence the degree to which proposals are judged fairly. Everyone has preferences, and researchers are no exception:
their own work and participation in a field frames the way they view the world. The danger comes when these preferences preclude a careful and open-minded reading of approaches that diverge from a reviewer’s personal viewpoint.
As important as it may seem to identify and eliminate conflicts of interest and biases in the peer review process, enhancing the likelihood that the system identifies and supports high-quality research renders the pursuit of these absolute goals unattainable and unadvisable. In making decisions about who to include on panels, many top-flight investigators predictably have potential conflicts or biases: they are likely to be very familiar with each other, and they may have collaborated on projects, critiqued each other’s work, coauthored papers, or mentored or taught an applicant. At a minimum, they are likely to have already formed views on each other’s work. These biases reflect the preferences that investigators have for certain theoretical and methodological practices and their ideas of what the cutting edge in a field is or should be and therefore affect the ways in which proposals are viewed from the outset.
The existence of these relationships and viewpoints raises questions about the impartiality of reviewers to judge the merits of a proposal fairly that must be addressed in vetting investigators for participation on panels. However, if a decision rule regarding conflicts of interest is applied too stringently, the pool of competent reviewers will dwindle significantly. Making conflicts of interest public is essential, but eliminating them altogether is not feasible. And while conflicts of interest should be minimized, it is often the case that agency personnel need the flexibility to exercise their judgment about how to carefully balance the imperative of involving top experts in the process while guarding against reviews that are based on judgments outside the merits of the proposals themselves.
With respect to bias, however, the issue for assembling panels is to achieve a balance of perspectives and biases. The goal is not to minimize biases—as they are inherent in every reviewer—but rather to ensure that no single paradigm or perspective dominates the review panel. As we argue in the section that follows, engaging a range of perspectives sharpens thinking about, and opens avenues for considering, quality in the research that is funded. And as we discuss in the section on quality, so long as reviewers can agree on basic standards of quality, these divergent preferences can be accommodated in the peer review process and indeed can strengthen its outcomes. Without this common framework, however, there is no basis for negotiating differences in productive ways.
Two broad types of diversity are relevant to assembling high-quality panels and to promoting education research quality through peer review: diversity of disciplinary and methodological perspectives and diversity of groups traditionally underrepresented in education research. Actively pursuing diversity along both of these dimensions in an agency’s peer review system can serve a number of important functions, including lending the process legitimacy, enhancing and extending learning opportunities in peer review deliberations, and promoting the identification and support of high-quality research. We take up the first two of these functions in later sections of the report, focusing on the discussion of quality in this section.
Engaging peers with a range of scholarly perspectives is important for assessing quality in any field, including education research. Redish, drawing on his experience in physics research as well as in physics education research, cautioned on the dangers of peer review systems having a narrowing effect on a field too quickly. He argued that peer review systems ought to reflect an ethos of scientific “pluralism,” especially in a field like education research that is multidisciplinary and still emerging as an area of scientific inquiry.
Assembling diverse panels with respect to groups traditionally underrepresented in education research—like racial and ethnic minorities—is also an important consideration that surfaced a number of times in workshop discussions and is especially relevant to education research, as it often grapples explicitly with issues involving diversity. One important aspect of research quality across many of the agencies discussed at the workshop is the relevance or significance to educational problems of the proposed work. Assembling panels with a range of personal backgrounds and experiences can foster an environment in which questions are provoked and issues raised that otherwise might not have surfaced, and help ground the review in the cultural and social contexts in which the work is proposed to be conducted and expected to have an impact.
Vinetta Jones, dean of the Howard University School of Education, made this point directly in posing questions to Grover (Russ) Whitehurst about the diversity of peer review panels at the Institute of Education Sciences (IES). She argued that pursuing excellence in, and specifically ensuring the relevance of, education research projects and programs requires an inclusive approach to the composition of panel membership with respect to racial and ethnic diversity, gender, and other background characteristics.
Whitehurst responded by relaying his personal experience reviewing the publication record of potential peer reviewers, noting that their racial and ethnic background was rarely evident. He agreed that it was essential to ensure that deep knowledge of the populations and contexts in which education research would be conducted is represented in peer review deliberations, but that he seeks to ensure that peer reviewers have this knowledge by reviewing the focus of their previous publications.
These differing viewpoints and strategies underscore the complexities associated with the relationship between quality and group membership on peer review panels. While expertise and the personal background characteristics and experiences of panelists are different constructs, they are often related, at least at this point in history. For this reason, in the long run, we think it is likely that socially and culturally diverse peer review panels will result in a more expansive set of perspectives on the assessment of relevance and significance, thereby improving the overall quality of the research over time. Since quality in the peer review of education research proposals includes both technical and relevance criteria, ensuring a diverse set of panelists who collectively bring the expertise and experience necessary to judge both well should always be the goal.
Practitioners and Community Members as Peers
Should practitioners—for example, state school officers, superintendents, principals, teachers, curriculum developers—be peer reviewers? This is a hotly contested question in many domains of research, one that also pertains to the diversity in perspectives in peer review panels. Various countries and institutions have approached this question in different ways. For example, the Dutch Technology Foundation includes “lay citizens” in their reviews (Hackett and Chubin, 2003). Other institutions have devised innovative ways to involve community members in their work outside the peer review process itself. For example, Harold Varmus, former NIH director, tried to bridge gaps between researchers and community members by setting up a Director’s Council of Public Representatives. The council brings together representatives from various groups with an interest in medical research, such as patients and their families, health care professionals, and patient advocacy groups, to advise and make recommendations to NIH on issues and concerns that are important to the broad development of NIH programmatic and research priorities. If one aspect of the quality of education research—as we have argued—is its connection and relevance to policy
and practice, then it would stand to reason that those closest to the practice of education ought to bring their expertise directly to the task.
Following this logic, the former OERI and the EHR Directorate at NSF have historically tapped the expertise of practitioners and other stakeholders (e.g., parents) by including them as peers alongside researchers in reviewing the merits of education research proposals. Many workshop participants, however, questioned the implementation of this strategy, and the experience of several committee members led them to raise concerns during the event as well. For example, in their evaluation of peer review panels in the former OERI described at the workshop, August and Muraskin (1998) found that while most of the reviewers in their sample had conducted research in education, a sizable minority had not. In his remarks, Dodge warned that asking individuals without research expertise to evaluate scientific quality “discredits the process.” And Hackett, while arguing that peer review in education is a natural place to help bridge policy and practice, acknowledged that practitioner (those without research expertise) participation on review panels could undermine attempts to develop a strong sense of professional culture in the field. It may also serve to introduce political criteria into the review of merit if, for example, advocates participate on panels.
As Hackett suggested, however, there are also benefits associated with engaging the viewpoints of practitioners and stakeholders in peer review panels. Practitioners and stakeholders are typically well qualified to discuss the relevance of a particular proposal and its potential contributions to practice. They may also have comments about the application of an intervention proposed for evaluation. Although they are less likely to have expertise on specific technical aspects of the proposal, such as the design, statistics, and sampling plan, they may provide insights about relevant feasibility concerns.
Moreover, in one of the few studies of the impact of research consumers and advocates on peer review panels, Andejeski et al. (2002) reported that both researchers and consumers found it highly valuable to include consumers (in this case, survivors of breast cancer) on peer review panels for the Department of Defense research program on breast cancer. However, in contrast to the way in which practitioners and stakeholders have often been incorporated into education research panels, the ratio of scientists to lay reviewers was high (averaging about 7:1), consumers were trained on the criteria and the process, and they were assigned specifically to review the applications for the importance and applicability of the research and
issues related to human volunteers, such as the burden on the participant. The consumers made their comments after the scientists’ review.
Overall, workshop participants and the committee agreed that the participation of practitioners in the education research and review process was critical; whether and how agencies involved practitioners in peer review panels to accomplish that goal varied considerably across and within agencies. For example, NIH has a two-tiered model. First, study sections (most often convened by the Center for Scientific Review), consisting of scientific expert reviewers, judge the scientific merit of proposals. The result of the review is a score and a written summary of the evaluation. Second, institute-specific advisory councils, composed of both scientists and other stakeholders, consider the relevance of the proposals, and in view of both the scientific merit and the potential impact, make recommendations about which proposals should receive funding.
Still another way to systematically engage practitioners in reviewing research is through an approach used by OSEP, whose agency assembles peer review panels of stakeholders to retrospectively assess the value of the agency’s portfolio of research in addressing practical ends. This structure, when coupled with peer review by researchers, captures the expertise of both but does not involve practitioners in judging the merits of research proposals directly.
Finally, several agencies include practitioners on priority-setting oversight boards. While separate from the peer review process itself, the identification of areas ripe for research shapes the content of the research competitions and the proposals received in response, indirectly but significantly influencing the policy and practical grounding of the research. For example, the former National Educational Research Policy and Priorities Board and the new National Board for Education Sciences are both modeled on this idea.
How Quality Is Judged
Evaluation criteria—how potential research quality is operationalized for the purpose of peer review—focus the review on specific dimensions of quality. The criteria used to judge research proposals vary across agencies and sometimes across competitions within agencies. All include some assessment of technical quality or scientific excellence (“intellectual merit,” “quality of design,” “approach”) and typically its relevance (“significance,” “broader impacts”). Agencies commonly weigh and quantify these criteria
to ensure that no proposal would get a high total rating if it scored low on either. As Breckler put it at the workshop, technical merit is necessary but not sufficient; similarly, relevance is necessary but not sufficient. Some agencies also consider the quality of the personnel and management plan (e.g., for larger projects like research centers that include multiple investigators and institutions). Other systems include an overall judgment of quality as well. For example, Danielson, in describing the peer review process at OSEP, said that reviewers score proposals on a 100-point scale, but they are also asked to give an additional recommendation of “approved, disapproved, or conditionally approved.” Similarly, the NIH study section assessments include “approval” or “disapproval” as well as overall judgments of quality (e.g., “outstanding,” “excellent,” etc.).
In most agencies, peer reviewers are asked to assess each proposal against these criteria, to assign corresponding scores as appropriate, and to provide written comments to support their scores and describe strengths and weaknesses in each proposal. Peers discuss their views and scores as a group, and the opportunity to change scores based on group discussion is extended. Once final scores are assigned, staff averages the scores, creates a slate of proposals ranked from highest average score to lowest, and forwards the slate to the head of the agency for final sign-off and funding decisions.
Ensuring quality along the dimensions used by an agency suggests the need to create measures that are both reliable and valid. Reliability in this context refers to the extent to which a research proposal would receive the same ratings, funding outcome, and feedback across multiple independent review panels. Ensuring high reliability is important because it helps to quell fears that the ratings are an anomaly or just a function of the particular group assessing them. Even if ratings are perfectly reliable, however, they may not reflect the intended evaluation criteria—that is, they may not be valid. Reliability does not ensure validity, but without reliability, results and feedback will be inconsistent and almost surely not valid.
At the workshop, Domenic Cicchetti, statistician and methodologist at Yale University and author of seminal publications on the topic of reliability in peer review, provided an overview of his work on reliability in the evaluation of both journal submissions and grant proposals, based on an annotated presentation he prepared for the workshop (Cicchetti, 2003). Analyzing agreement statistics across individual judges involved in peer review of manuscripts submitted to journals for publication, he concluded that reliability was generally low.
How to think about and promote reliability in any form of peer review is a topic of considerable controversy and commentary (see, e.g., extensive commentary on Cicchetti’s foundational work in this area in an issue of Behavioral and Brain Sciences, 1991). In our view, examining agreement among individual judges is reasonably appropriate for assessing reliability in journal submissions, because reviews are typically conducted independently by mail and then simply averaged. For our purposes in considering research proposal reviews, however, the review process more typically involves group discussion among panelists with different types of expertise and is designed to promote consensus. Since the process necessarily involves interaction and argument, the individual ratings among panelists are not independent. In fact, a panel with diverse content and methodological expertise will be likely to produce a more complete review even though initial ratings by individual panel members may vary widely (that is, be inconsistent with one another). To the extent that the consensus-building processes are effective, analyses of initial independent ratings may underestimate the reliability of group results—that is, they may be poor indicators of the reliability of the group consensus on quality as reflected by group expertise. However, the reliability of panels as a whole, while a more useful construct, is difficult to measure because agencies overseeing reviews of research proposals never have the luxury of convening multiple panels to review the same proposals and then comparing the results across the independent panels.
Validity, as applied to the results of peer review, refers to the extent to which inferences made from the resulting ratings and specific feedback are warranted given the information provided in proposals for research funding (Messick, 1989). It is possible for results to be reliable (consistently repeatable) but still not support valid judgments of the merits and deficiencies of a proposal. In general, validity is considerably more difficult to assess than reliability, and there have been very few studies of the validity of peer review results.
Evidence for validity will vary across the different priorities and evaluation criteria established by different agencies. NSF programs, for example, use two separate criteria: intellectual merit and broader impacts. Measures of validity for intellectual merit ratings might include the extent to which ratings reflect how well relevant theoretical constructs are characterized in a proposal, or the appropriateness of applying a particular statistical test for analyzing the data that will be collected. Assessing the validity of impact
ratings might involve examining whether they predict the actual participation of traditionally underrepresented groups in funded projects to a useful extent. Few agencies have the time or resources to invest in true validity studies. The difficulty of establishing the validity of peer review results empirically is, in fact, the major reason why the use of expert judgment is the single best option for proposal evaluation.
Finally, there is an element of quality considerations in peer review that relates to risk. Some agencies incorporate the idea of originality or innovation into the criteria used to assess quality. Indeed, in a recent study, Guetzkow, Lamont, and Mallard (2004) found that multidisciplinary social science peer review panelists often viewed originality as what distinguished worthy from less worthy academic work. Although this idea was not explored in much depth at the workshop, it is an important consideration. Risk can be thought of as a dimension of quality with respect to the broad education research portfolio in an agency. If agencies never support new work that strikes off in a new direction, develops new methods or analytic tools, challenges core ideas, or approaches a problem from a novel perspective, the potential for significant progress, or even breakthroughs, will be substantially curtailed. However, peer review tends to reward proposals that rely on established assumptions, models, and techniques. Risk-taking, therefore, may have to be supported through other funding mechanisms, but so long as it is undertaken to strategically invest in highly innovative work, it can be an important element of federal education research portfolios.
Workshop discussions about research quality analyzed both short-term and long-term aspects of quality, and many participants argued that peer review systems ought to be designed to attend to both. Peer review is typically designed to identify high-quality proposals for a given agency competition. But quality can also be viewed as a long-term prospect. Both Redish and Borko explicitly isolated the potential for peer review to upgrade the future quality of research. Indeed, it could well be that none of the proposals submitted in a particular competition will lead to research of the highest quality. In this case, the only way to improve the quality of education research is to get authors to improve the quality of their proposals. Even when research is funded, feedback on issues requiring additional attention can provide constructive suggestions on how to upgrade future submissions.
FURTHER DEVELOPING A PROFESSIONAL CULTURE OF INQUIRY
Peer review of education research proposals also ought to be designed to support the development of the field of education research. In this section, we analyze facets of peer review that relate most directly to upholding this objective: diversity of perspectives and backgrounds, standing panels, feedback, the role of staff, and training.
Several workshop participants suggested that since peer review can and should serve an educative function, efforts to involve a diversity of research perspectives as well as the participation of people from traditionally underrepresented populations in the process were imperative. In response to a question about how agencies ensure diverse perspectives on peer review panels, Steven Breckler told the group that NSF program officers spend a significant amount of time trying to identify people and places that “ordinarily are not plugged into the NSF review process.” He also pointed to the NSF criteria for reviewing research applications, which require an assessment of the extent to which the proposed activity will broaden the participation of such groups in the evaluation of the proposals themselves “broader impacts.” According to Stanfield, NIH also pays close attention to these issues, relying on a number of mechanisms to promote broader participation, including the use of discretionary funding to support research among underrepresented groups and institutions.
In terms of this professional development goal, workshop discussions also focused on the role of peer review for developing junior scholars, another way to view diversity in the composition of panels. At the workshop, Peterson argued that a critical function of peer review in education research was to promote learning opportunities and growth among early career researchers. Borko made a similar argument, suggesting that peer review be used to “mentor the next generation of researchers.” Agency representatives offered examples of how this goal is pursued in practice. For example, Sloane noted that in his work, “we make an effort to have about 20 to 25 percent of our panels be people who are not tenured.”
Panelists can be assembled once to review a single set of proposals (ad hoc panel) or on a regular basis to meet over a predetermined length of time and consider a particular area of research (standing panel). There are strengths and weaknesses of both approaches. Ad hoc panels may be prudent when efficiency must be maximized; the review of small, exploratory grants may also be best served by assembling one-time groups.
To promote professional development and capacity building in the field, standing panels are a very attractive mechanism. Since education researchers come from so many fields and orientations, panels focused on particular issues or problems in education can promote a collective expertise that builds interdisciplinary bridges and facilitates the integration of knowledge across domains. Hackett, drawing from his own experience participating on NSF peer review panels, asserted that establishing interdisciplinary trust is difficult when panels are ad hoc. In contrast, he argued that standing panels that convene groups of investigators regularly around issues or problems can be quite promising in this regard. Standing panels provide a context for researchers to build relationships with scholars they might not otherwise know. Panel members can carry these experiences into their own work and that of their colleagues, forging broader disciplinary connections among more and more researchers studying common phenomena and questions but approaching them from different perspectives.
The use of standing panels is also likely to encourage the participation of top-flight investigators, as these longer term experiences are more attractive as professional learning opportunities than short-term panels. Offering this benefit is particularly needed in education. In their evaluation of OERI, Diane August and Lana Muraskin reported that many former panelists do not view peer review as worthwhile for their career development and trajectory (August and Muraskin, 1998). Although there are surely many factors that lead to this sentiment, it is worth noting that peer review panels at OERI were always ad hoc.
Standing panels can also provide the kind of stability and institutional knowledge that can facilitate positive outcomes in resubmitted proposals. Not all agencies have standard resubmission policies—that is, formal procedures that unsuccessful proposers can follow to respond to the reviews of the proposal and potentially receive funding at a future date. Such processes can identify promising projects in need of further development for funding and provide concrete direction for improvements in specific areas. When an (improved) application is resubmitted, the panel members know
the history of its development and can more knowledgably evaluate it on how well the proposers have responded to specific critiques rendered during its initial review.
In addition, when groups of scholars meet regularly in peer review, they provide continuity of vision to programs of research—lines of inquiry in particular areas that together point to new insights, raise new questions, and suggest future directions for agency competitions. Over time, panelists acquire an understanding of the roles and relationships between the field and the agency, enhancing mutual understanding and reinforcing the norms of the culture in the context of the agency’s operations. It is the continuity that standing panels bring to an agency’s peer review system that is the basis for fostering powerful learning among proposers, reviewers, and staff.
Although well-suited as a professional development tool, standing panels have their drawbacks. Retaining the same people over time can have a narrowing effect on the advice given to agency leadership, which is why many standing panels have term limits. Standing groups develop a consensus view of the field and its needs, which can result in neglecting potentially important lines of inquiry, methodological approaches, or contextual factors. Worse, they can institutionalize the biases the members bring to the work. The potential for these negative consequences is heightened if the members are not explicitly and carefully selected to represent a range of perspectives, if they do not approach their work with a willingness to listen and to consider differences of opinion and approach thoughtfully, and if their biases are not declared, considered, and balanced.
Most peer review systems are designed in one way or another to provide substantive feedback to proposers (or would-be proposers) on the strengths and weaknesses of their plans. The mode of feedback can take any number of forms. At the Office of Naval Research (ONR), for example, program officers spend substantial amounts of time working directly with investigators before they write a formal proposal for funding consideration. At many other agencies (e.g., NIH, NSF, and OSEP), the primary feedback mechanism is the provision of written products from the proposal review process—forms completed by reviewers that detail strengths and weaknesses for each evaluation criteria.
Substantive feedback—as well as clear guidelines for resubmission of rejected proposals—can play a vital role in promoting peer review’s educa-
tive function. At the workshop we learned that a major finding of the OERI evaluation by August and Muraskin (1998) was that the written reviews of proposals were cursory and often merely descriptive summaries of the content of the proposals themselves (rather than analysis of the content with respect to the review criteria). Both Borko and Peterson emphasized the value of feedback in the process and the need to upgrade its use in current systems. At the same time, agency staff from OSEP cited persistent problems getting reviewers to fully document their comments and to clearly justify their ratings, and August and Muraskin (1998) noted this problem in their evaluation of the former OERI’s peer review system as well. If peer review is to serve a professional development function effectively, agency staff and reviewers should take these responsibilities seriously and invest the time to fulfill them.
Yet another issue aired at the workshop showed how difficult establishing high-quality feedback can be. Both representatives from NIH described difficulties the agency encountered in recent years because investigators bristled at what they perceived to be inappropriate directives from reviewers. In response, then-director Harold Varmus determined that summary statements emanating from reviews should evaluate the proposed research according to established review criteria, but they should not be tutorials telling investigators how to do their research. In this context, there was considerable discussion about the appropriate level of detail that ought to be part of reviews: How do reviewers and staff balance the need to justify ratings and to communicate effectively with applicants while respecting the professional judgment of applicants? Danielson also raised the issue of resource constraints in this context, suggesting that if the agency were to provide detailed feedback on each of the roughly 4,500 applications they receive each year, they would have to contract the work out due to limited staff resources. We support erring on the side of more detailed information and critique, as this documentation is a key component of a feedback loop that can lead to future improvements in a field.
In addition to reviewers’ written feedback, agency staff can also interact with members of the research community—at professional association meetings, workshops convened specifically for principal investigators and future principal investigators, and other such venues—to orient investigators to the agency’s priorities and processes. The level of detail, approach, and other such particulars associated with the content and format of proposals is not the same across or even within federal research agencies, and the more familiar proposers and reviewers are with these important process
mechanisms, the better the review and, most importantly, the better the products of the review. Explicit training on the nature of feedback should also be provided to reviewers; we take up such training issues in a later section.
Role of Staff
Another key feature of peer review systems is the role of staff in the process. Agency staff are part of the human resources of the research field, playing both teaching and learning roles. There are very real trade-offs associated with the various models of staff involvement in practice today. Three of the agencies represented at the workshop—NIH, NSF, and ONR—nicely illustrate two models at opposite extremes and a hybrid approach to staff involvement. At NIH, the system is very deliberately built to erect a clear separation (sometimes called a firewall) between the staff who write the grant announcements soliciting proposals and developing scientific programs and the staff who select and interact with peers in the review of proposals received in response to those solicitations. In contrast, at ONR, a single staff person (sometimes called a strong manager) performs all of these functions. The system at NSF falls somewhere in between—endowing program officers with a fair degree of authority to shape competitions and to select peers, while creating checks and balances in the system to guard against improprieties.
The benefit of the ONR approach is in continuity of expertise. Knowledgeable staff can follow the process from beginning to end, substantively interacting with members of the field in ways that facilitate learning on both sides and result in work with tight alignment to agency goals. As Susan Chipman, director, Cognitive Science Program, of the ONR, described the process, “ONR staff are the peers—they review proposals and make recommendations for funding.” Program officers at ONR often use multiple internal peers to judge research proposals, including potential consumers of the work, since ONR’s work is very applied and mission-oriented. Program managers like Chipman actively develop research programs based on the needs of their agency.
The trade-off is that this kind of participation across all parts of the peer review process can result in a loss of external legitimacy. Whitehurst, in describing his plans for peer review at the IES, articulated this downside. In the former OERI, program officers who developed solicitations also selected the peers to review proposals. He acknowledged that this continuity
is beneficial because that person becomes expert in all aspects of the competition. The problem, as he described it, is that having responsibility for both kinds of tasks raises the possibility of infusing bias into the system, thereby weakening its overall legitimacy. As he put it, investigators might reasonably wonder: Is everyone getting a fair shake, or are those researchers who are chummy with the program officer getting an unfair advantage? The NIH model, with its built-in firewall, creates a clear boundary and, as Dodge put it, this “keeps it pure.”
Describing the NSF process as it relates to these two models, Breckler argued that their hybrid approach taps the best of both worlds by relying on external panels of experts while allowing program officers substantive involvement. He asserted that the tenets of social psychology suggest that the best way to get people to act responsibly is to make them identifiable and responsible for what they are doing, supporting the kinds of roles that staff are authorized to serve: crafting program announcements, selecting peers for review, and settling on a slate to pass on for funding decisions. This approach, he suggested, allows one person to go against the group tendency to be conservative—that is, to reject innovative ideas. And a high level of responsibility helps to attract high-quality officers to the agency.
Responding to questions about the potential abuses of such a system, Breckler argued that the system is rarely compromised because the process is open. The agency mandates extensive documentation of peer review panels, requiring program officers to certify that they have completed parts of the process to the best of their ability and in concordance with relevant policy. To address charges that some investigators may not get the fair shake to which Whitehurst referred, Breckler pointed to a complex array of conflict of interest rules for program officers. Furthermore, NSF has a long-standing tradition of instituting a final check in the process by engaging a committee of visitors to periodically and comprehensively assess research programs on a host of dimensions, including whether such conflict of interest rules were followed. The researchers who are called on to serve this function are asked to carefully scrutinize all aspects of the process to assess its fairness and legitimacy, and the results of the assessment are made publicly available.
For peer review to fulfill a professional development function, explicit training for reviewers, proposers, and staff must be part of the process. But
workshop participants revealed that in-depth training is the rare exception rather than the rule in practice.
Training reviewers was raised repeatedly at the workshop as an important element of the peer review process, and agency participants discussed strategies and identified impediments to facilitating successful training. Stanfield described ways NIH helps to familiarize reviewers with the peer review process, including brokering meetings prior to the panel discussion and setting its tone by beginning with experienced reviewers. Breckler asserted that providing model reviews to reviewers would be a helpful strategy, lamenting that this practice is not permitted at NSF. Chipman agreed, suggesting that the use of model reviews could help strengthen a tradition of high-quality reviews in peer review settings for education research.
In describing some training techniques she has used for reviewers at the National Institute on Drug Abuse at NIH, Levitin highlighted several potentially helpful strategies. She suggested that training starts well before the first meeting of the group, is both formal and informal, and is grounded in “general principles and policies.” Levitin suggested that if reviewers are well versed in a “few fundamental” ideas, they will be able to provide a fair review. She made clear that there cannot be hard and fast rules for every circumstance, given the very complex nature of review, different types of applications, and other factors, but that there are policies and procedures to guide review in making fair judgments. One key area of training she described relates to teaching reviewers how to apply the review criteria. At NIH, ratings range from 1 to 5, with 1 being the most meritorious. Levitin also stressed that it is important to communicate to reviewers how to provide balanced and thorough reviews, so that the strengths and weaknesses of every application are described and only the stated review criteria are used to assess them.
Training potential applicants was also an area discussed at the workshop. The agencies represented at the workshop relied on a range of largely informal strategies to promote better proposals—such as program officers talking with junior scholars about the grant-writing process—and the degree to which this issue was addressed varied quite a bit. Procedures for resubmission at NIH was the most formal procedure described: with clear and comprehensive written feedback on the weaknesses of a submission, proposers get insights into how to improve their future proposals to the agency and are informed of specific guidelines for resubmitting a revised application in a future grant cycle. Milton Hakel, an industrial and organizational psychologist from Bowling Green State University, suggested that
the ability to write rejoinders to reviews could also be instructive. In many respects, the opportunity to revise an application in response to peer review provides this type of opportunity. One-time submission policies without explicit requirements for identifying a proposal as a resubmission and explaining how the grant has been revised misses valuable opportunities for professional development of the researchers.
Finally, the training of staff is similarly important, but no one at the workshop mentioned any kind of professional development for staff involved in peer review systems. Indeed, in their recommendations, August and Muraskin (1998) suggested staff training as a strategy for improving the peer review process at OERI. How to develop training for agency staff would depend on the specific tasks the staff are expected to perform and the skills and knowledge needed to accomplish them effectively.
AGENCY MANAGEMENT AND INFRASTRUCTURE
Like any system, the peer review process must be effectively managed. Negative experiences of many reviewers of education research proposals—especially in the competitions studied in the evaluation of the former OERI by August and Muraskin (1998) and in testimony about peer review at OSEP to the President’s Commission on Excellence in Special Education (2002)—in large part derived from poor logistics. Active, careful attention to logistical arrangements enables a smooth peer review process that encourages participation and improves its outcomes. For example, lead time is critical to engaging top scholars in the process. Last-minute planning (often deriving from either legislative or executive branch delays) invariably leads to conflicts with previous commitments, seriously reducing the likelihood of tapping top talent to participate. It also leaves little time for substantive reflection on proposals, leading to cursory and incomplete feedback and, in extreme cases, poor advice to decision makers about funding priorities. Infrequent and inconsistent announcements can set off a “now or never” mentality among researchers, ensuring a high rate of rejection given scarce resources and depleting the pool of potential reviewers. Active proposal management—through triage processes that involve an initial cut through the proposals and assignment of only promising projects to reviewers—can minimize workloads, focusing attention on high-priority areas and making participation manageable for reviewers.
Despite the many anecdotes of how important peer review is to the field and to individual research careers, agency representatives consistently
pointed to increasing difficulties in recruiting reviewers. Stanfield identified the logistical hurdles involved with convening face to face meetings as particularly problematic: “It is very difficult to get very busy scientists to come to Washington three times a year for four years.” Similarly, Breckler stated, “it is difficult to get people who are going to dedicate themselves to do peer review” and “it is getting increasingly difficult.” Incentives for scholars to serve as peer reviewers derive from a number of sources and compel individuals to behave in a variety of ways. Many of the sources are outside the control of any given federal agency (e.g., whether or not service on peer review panels is recognized in promotion and tenure decisions). Agencies can do their part to enable the recruitment of top-flight investigators to review by ensuring that their systems are managed effectively and reviewer workloads are minimized to the extent possible. For example, the August and Muraskin (1998) evaluation reported that many reviewers at the former OERI spent far longer reviewing than the estimated time commitment they had been provided by agency staff.
FLAWS AND ALTERNATIVES
To this point, we have not taken on what might be considered the threshold question: What are the drawbacks to peer review as a mechanism for informing resource allocation of federal research dollars, and are there viable alternatives? There are indeed problems with peer review, some of them significant (see Finn, 2002; Horrobin, 2001; McCutchen, 1997). And there are other ways that research dollars have been and are distributed.
The workshop discussions did not address these questions in any detail. Hackett and Chubin’s paper (2003), however, does provide an overview of some of these issues. To set the stage for the committee’s recommendations in Chapter 3, and drawing on Hackett and Chubin’s analysis, we acknowledge and describe some of the most worrisome weaknesses of peer review. We also identify some of the alternatives they describe for allocating federal education research dollars, ultimately concluding that, despite its flaws, peer review is nonetheless the best available mechanism for allocating scarce education research dollars.
A persistent complaint about the peer review process is the possibility of cronyism—that is, that engaging peers predisposes outcomes to benefit friends or colleagues with no or little regard for the actual merit of a given proposal (Kostoff, 1994). This situation can lead to a kind of protectionism
that repeatedly rewards an elite few, narrowing the breadth of perspectives and ideas that is so critical to scientific progress and stunting potentially promising lines of inquiry.
The peer review process can also inhibit innovation. Arguably, peer review is expected to draw the line “between sound innovation and reckless speculation.” As Hackett and Chubin (2003, p. 17) argue, “a review system at one extreme could reward novelty, risk-taking, originality, and bold excursions in a field … [or] it could sustain the research trajectory established by the body of accepted knowledge by imposing skeptical restraint on new ideas.” The closer to the latter pole a system becomes, the more easily it could reject promising ideas as implausible. Current practice is often criticized for being too conservative—a well-known example recounted by Hakel at the workshop is that when the original manuscript describing the double-helix structure of DNA was submitted for publication, it was subjected to peer review and rejected.
What about other ways to allocate research dollars? As Hackett and Chubin (2003) report, Congress has the prerogative to allocate funds through direct appropriation (also termed “earmarking” or “pork barrelling”). In fiscal year 2002, Congress earmarked $1.8 billion for projects at colleges and universities. While not all of this money is for research, earmarks for academia are a useful indicator of the exercise of direct appropriation. And while compared with the roughly $100 billion federal investment in research and development, $1.8 billion is a relatively small amount, it seems somewhat larger in comparison to the $25 billion federal budgets for basic research (all data from analysis by the American Association for the Advancement of Science of the R&D budget; http://www.aaas.org/spp/rd/guihist.htm).
The main deficiency of earmarking is that it circumvents technical expertise, jettisoning altogether the principle that scientific quality ought to be the primary basis for the allocation of research dollars. It also has a corrosive effect on the development of the research profession: without a clear link between rewards (continued funding) and performance (quality of proposals for future work), the core values of science would be eroded significantly (Hackett and Chubin, 2003).
Another alternative is to rely on a single, so-called strong manager who makes decisions on behalf of the agency according to his or her best judgment (as is done in ONR). As Hackett and Chubin (2003, p. 5) observe, “In effect, this is peer review with one peer, so this steward had better be on a par (intellectually and in stature within the field) with those applying for support … [and] should understand the field and its needs (which should
be clear and widely shared) to ensure that decisions and allocations are wise, legitimate, and effective.”
The arguments offered to support the strong manager arrangement include that it is flexible and responsive, and an efficient way to distribute relatively small pots of money. It may also be appealing because the manager is held accountable for performance outcomes (e.g., research-based products that benefit the Navy). However, it would be nearly impossible to scale this approach up to the size of NIH (about $27 billion in fiscal year 2003), and it would face similar difficulties in mid-sized research agencies. More importantly, such concentrated power limits the breadth and depth of expertise that can be brought to bear on proposals and invites serious questions of bias and partiality.
Hackett and Chubin (2003) discuss a third funding alternative—using a formula to allocate resources. Funds may be allocated to states or universities or institutes, then suballocated to groups or individuals according to a variety of additional criteria. Or formulas may be devised based on the past performance of individual scientists, with funds awarded accordingly. Some measure of current need or potential payoff may factor into the equation, as well as the number of researchers at a university or residents in a state. Fair and effective formulas would be hard to devise, and the relative merits of various options endlessly debated.
None of these options for allocating research dollars is perfect, including peer review. When peer review is compared with these alternatives, however, it emerges as the mechanism best suited to promote merit-based decisions about research quality and to enhance the development of the field. This statement does not preclude some type of blended approach in making decisions about what research to fund, however. Indeed, maintaining a variety of funding mechanisms can be leveraged to obviate the weaknesses of peer review. And there are additional design features that can be used in peer review to minimize potential problems. For example, the role of a peer review panel should always be to rank proposals, not to recommend particular decisions about what should be funded, as empowering panelists with making direct recommendations can more easily lead to questions about cronyism and conflict of interest. Term limits, blended expertise on panels, and attention to systematic evaluation of peer review processes and outcomes are additional examples of the kind we address in Chapter 3 that can and should be used to counterbalance the flaws of peer review systems.
In short, peer review as a system for vetting education research proposals in federal agencies is worth preserving and improving. So the question for us is how to strengthen it—a topic we address in the next chapter.