In recent years there have been increasing efforts by the federal government and the states to devise systems that make students, teachers, principals, or whole school systems accountable for how much students learn. Large-scale tests are usually a key component of such systems. The No Child Left Behind (NCLB) Act of 2001, a prominent example of such efforts, is the continuation of a steady trend toward greater test-based accountability that has been going on for decades. The use of high school exit exams by many states as a requirement for receiving a diploma is another example. Still another example is the widespread interest in using student test scores as a way of rating and rewarding teachers and principals. Test-based accountability systems provide policy makers with potentially powerful but blunt tools to influence what happens in local schools and classrooms. These policies attach consequences to assessments by holding educators and students accountable for achieving at certain levels on tests. When schools, teachers, or students score below performance cutoffs on tests, they often face sanctions, and when they perform well, they are sometimes rewarded. After reviewing policy and practice, Richard Elmore (2004) concluded that test-based accountability has been more enduring than any other policy in the field of education for at least the past 50 years and that it is unlikely to recede in the foreseeable future. Test-based accountability continues to dominate the policy agenda at the federal, state, and local levels—“a remarkable accomplishment in a political environment where reform agendas typically have shifted from year to year” according to Michael Feuer (2008, p. 274).
The test-based accountability movement in education can be seen as part of a broader movement for government reform and accountability over the past few decades that has sought to measure and publicize government performance as a way to improve it. The Government Performance and Results Act of 1993 is an example of the more general trend in the United States, and there are similar examples in many other countries.
While the broad objectives of these reforms to promote more “effective, efficient, and responsive government” are the same as those of reforms introduced more than a century ago, what is new are the increasing scope, sophistication, and external visibility of performance measurement activities, impelled by legislative requirements aimed at holding governments accountable for outcomes. (Heinrich, 2003, p. 25)
In education, accountability systems in the United States have attached ever-stronger incentives to tests over time. Tests for accountability purposes emerged under Title I of the Elementary and Secondary Education Act (ESEA) of 1965 and the start of the National Assessment of Educational Progress (NAEP). However, the original form of these national requirements for testing did not include explicit incentives linked to test results (Koretz and Hamilton, 2006; Shepard, 2008). In the 1970s, the minimum competency movement led to greater consequences being attached to the results of tests for students, with graduation and promotion decisions in some states being tied to test results. The 1988 reauthorization of ESEA required Title I schools with stagnant or declining test scores to file improvement plans with their districts.
The standards-based reform movement of the early 1990s led to the requirement in the 1994 ESEA reauthorization for states to create rigorous content and performance standards and report student test results in terms of the standards (National Research Council, 1997, p. 25). This was followed by the requirements of the 2001 reauthorization (NCLB) for schools and districts to show progress in the proportion of students reaching proficiency or to face the possibility of restructuring. The emergence of value-added modeling led to increasing interest in the use of test results for evaluating and rewarding individual teachers and principals (National Research Council and National Academy of Education, 2010).
This brief sketch of test-based accountability in education over a 50-year period condenses a complicated and fitful history into a few pivotal points. In some cases changes at the national level were preceded by changes in individual states, and over the decades there were periodic waves of concern about education that included the reaction to Sputnik in 1957, the publication of A Nation at Risk (National Commission on Excellence in Education, 1983), and responses to the U.S. position on the
international comparative tests that became available in the late 1990s and 2000s.
This report does not attempt to provide a detailed history of the growing use of explicit incentives that are attached to tests. Rather, it reviews what social and behavioral scientists have learned about motivation and incentives over the same period that test-based incentives have spread. In response to the charge to the committee, the goal of the report is to inform education policy makers about the use of such incentives and to recommend ways that their use in test-based accountability systems can be improved.
The Committee on Incentives and Test-Based Accountability in Public Education was established by the National Research Council (NRC) with support from the Carnegie Corporation of New York and the William and Flora Hewlett Foundation. The committee’s charge was to review and synthesize research about how incentives affect behavior that would have implications for educational accountability systems that attach incentives to test results.
The project originated in the recognition that there is important research about what happens when incentives are attached to measures of performance. Much of this research has been conducted outside the field of education and so is unlikely to be familiar to education policy makers. As they increasingly turn to the use of incentives in test-based accountability systems, their efforts should be informed by the findings from that research.
The goals of the committee’s study are to (1) help identify circumstances in which test-based incentives may have a positive or a negative impact on student learning, (2) recommend ways to improve the use of test-based incentives in current accountability policies, and (3) highlight the most important directions for further research about the use of test-based incentives in education.
In order to make the study feasible, it was necessary for the committee to focus its approach to addressing the charge with respect to how we would consider incentives, accountability, and recent research about the use of test-based incentives in education.
Incentives The committee focused on research related to incentives in which an explicit consequence is attached to a measure of performance. Although it can be difficult in some cases to draw a precise line between consequences that are explicit and those that are not, this rough contrast provided a practical way to focus the study in the current policy envi-
ronment where there is substantial interest in test-based incentives that clearly have explicit consequences. We did not use a broader interpretation of the term “incentive,” which could have encompassed all determinants of behavior and required a literature review that included all fields in the social and behavioral sciences.
Accountability The committee focused on research related to the use of test-based incentives for education accountability. We excluded both other types of accountability in education and a conceptual approach for contrasting those other approaches with test-based accountability.
Recent Research on Test-Based Incentives in Education The committee focused on two kinds of research: (1) basic research that has been conducted in the social and behavioral sciences with potential application to many different settings, including education, and (2) research on test-based incentives in education. For both kinds of work, we focused primarily on research that allows us to draw causal inferences about the overall effect of test-based incentives.
The committee’s entire effort could have been consumed by a broader approach to any one of these three elements. Only by judiciously limiting the focus on each one could we appropriately address our overall charge, which is to make policy makers aware of key findings about the use of incentives and the potential implications of these findings for the design of test-based accountability systems in education.
We note that our focus on incentives that involve the attachment of explicit consequences to test results specifically excludes the broader role that test results can play in informing educators and the public about the performance of the educational system and thereby providing stimulus for improvement. We understand that some readers would have wanted us to have broadened our treatment of “explicit consequences” to have included the publication of test results with its potential of both motivating educators to improve and driving policy pressure for reform. In the end, we did not have the capacity to adequately broaden the study in this way, which would have required a much richer treatment of incentive effects, types of accountability, and methods of research about education. We are sympathetic with the arguments that the information from test results is likely to affect both teachers and policy makers. However, we note that there have been many arguments and proposed policies over the past decade or two that have taken as their starting point a conclusion that mere information has been insufficient to drive educational improvement (e.g., National Research Council, 1996). The result has been a strong focus in education policy on the importance of attaching explicit
consequences to test results. That is the type of test-based incentives that our study examines.
In addition, we note that our literature review is necessarily limited by the types of incentive programs that have been implemented and studied. Given the intense interest in the use of incentives over the past decade, there are incentive programs that are too new to have been evaluated by researchers, and there are interesting proposals for incentive programs that have not yet been implemented. We mention some of these new programs and proposals throughout the report, but we obviously cannot draw any conclusions about their effectiveness at this time.
It has been more than a decade since the landmark National Research Council (1999) report, High Stakes: Testing for Tracking, Promotion, and Graduation, was issued. That report contains a number of cautions about the use of student tests for making high-stakes decisions for students, with notable recommendations about the importance of using multiple sources of information for any important decision about students and the necessity of providing adequate instructional support before high-stakes tests are given. High Stakes cited a “strong need for better evidence on the intended benefits and unintended negative consequences of using high-stakes tests to make decisions about individuals,” particularly with respect to evidence about “whether the consequences of a particular test use are educationally beneficial for students—for example, by increasing academic achievement or reducing dropout rates” (p. 8). In the years since High Stakes was published, the use of test-based incentives has continued to grow, and researchers have made important advances in their evaluations of those evaluations. This report looks at what we have learned as a result.
Chapter 2 reviews findings from two complementary areas of research in the behavioral and social sciences about the operation of incentives: theoretical work from economics about using performance-based incentives and experimental results from psychology on motivation and external rewards. Chapter 3 looks at the use of tests as performance measures that have incentives attached to them, considering some key ways the effect of incentives is influenced by the characteristics of the tests and the performance measures that are constructed from test results. Chapter 4 reviews research about the use of test-based incentives within education, specifically looking at accountability policies with consequences for schools, teachers, and students. Chapter 5 concludes with the committee’s recommendations for policy and research.
It is important to note two aspects of the context for our work, although they may seem obvious. First, throughout the report, we focus on one part—the incentives—of a test-based accountability system, which is itself only one part of the larger education system. Our focus was driven by our charge, not because incentives are the only important part of a test-based accountability system or the only important part of the education system. Researchers have proposed a number of elements that are likely to be needed for a test-based accountability system to work effectively in the overall education system (see, e.g., Baker and Linn, 2003; Feuer, 2008; Fuhrman, 2004; Haertel and Herman, 2005; O’Day, 2004). In addition to the role played by incentives themselves, researchers have noted the importance of clear goals, appropriate educational standards, tests aligned to the standards and suitable for accountability purposes, helpful test reporting, available alternative actions and teaching methods to improve student learning, and the capacity of educators to apply those alternative actions and teaching methods. Although we note at some points the importance of these elements in allowing test-based incentives to change behavior in ways that will improve student learning, at many points in the report the importance of these other elements is left unstated and should be inferred by the reader.
Second, this study was conducted at a time of widespread interest in NCLB, which is currently the most visible education accountability system in the United States. As a result, NCLB forms a backdrop for much of the policy interest in the effects of incentives, and readers may at some point view this report as a critique of that law. However, the study was not intended or conducted as a critique or evaluation of NCLB. As noted above, NCLB is a continuation of a broader trend toward the use of stronger test-based incentives that has been going on for decades. This study is focused on evidence related to that broader trend, not on particular aspects of a specific law. In particular, we view our report as a resource for policy makers looking to the future of accountability, not as an evaluation of any particular past practice or program.