The preceding chapters have synthesized our key findings and conclusions from the basic research about the way that incentives operate and from the applied research about the results of implementing test-based incentive policies in education. In this chapter, the committee recommends ways to improve current test-based incentive policies and highlights important directions for further research. We discuss the use of test-based incentives, the design of test-based incentive programs, and the research that is needed about those programs.
As discussed in Chapter 4, there have been a number of careful efforts to use test-based incentives to improve education. They have included broadly implemented government policies—notably, state high school exit exams and the school-level requirements of NCLB and its predecessors—as well as experimental programs. A number of these programs have been carefully studied, using research designs that allow some level of causal conclusions about their effects. We conclude (see Chapter 4) that the available evidence does not give strong support for the use of test-based incentives to improve education and provides only minimal guidance about which incentive designs may be effective. However, basic research related to the design of incentives and the practical experience from implementing the first generation of incentive programs suggest more sophisticated approaches to designing incentive programs that are promising and should
be investigated. As a result, we recommend that policy makers continue to support the development of new approaches to test-based incentives but with a realistic understanding of the limited knowledge about how to design such programs so that they will be effective.
Recommendation 1: Despite using them for several decades, policy makers and educators do not yet know how to use test-based incentives to consistently generate positive effects on achievement and to improve education. Policy makers should support the development and evaluation of promising new models that use test-based incentives in more sophisticated ways as one aspect of a richer accountability and improvement process. However, the modest success of incentive programs to date means that all use of test-based incentives should be carefully studied to help determine which forms of incentives are successful in education and which are not. Continued experimentation with test-based incentives should not displace investment in the development of other aspects of the education system that are important complements to the incentives themselves and likely to be necessary for incentives to be effective in improving education.
It is only by continuing to conduct careful research about test-based incentive programs that it will be possible to understand how they can be more effectively designed. The small or nonexistent benefits that have been demonstrated to date suggest that incentives need to be carefully designed and combined with other elements of the educational system to be effective. Much additional work will be required to learn whether and how test-based incentives can be used to produce consistent improvements in education. The available evidence does not justify a single-minded focus on test-based incentives as a primary tool of education policy without a complementary focus on other aspects of the system.
The general lack of guidance coming from existing studies of test-based incentive programs in education suggests that future policy experimentation with test-based incentives should be guided by the key contrasts that emerge from basic research about how incentives operate.
Recommendation 2: Policy makers and researchers should design and evaluate new test-based incentive programs in ways that provide information about alternative approaches to incen-
tives and accountability. This should include exploration of the effects of key features suggested by basic research, such as who is targeted for incentives; what performance measures are used; what consequences are attached to the performance measures and how frequently they are used; what additional support and options are provided to schools, teachers, and students in their efforts to improve; and how incentives are framed and communicated. Choices among the options for some or all of these features are likely to be critical in determining which—if any—incentive programs are successful.
In general, the design of test-based incentives should begin with a clear description and delineation of the most valued educational goals that the incentive program is meant to promote, as well as recognition of the tradeoffs among these goals. Those goals should shape the features of the incentive program, even though experience shows that the effects of a program may not always occur in the ways intended.
The performance measures used in an incentive system are likely to be critical. The tests and indicators used for performance measures should be designed to reflect the most valued educational goals, and their relative weights in the incentive system should reflect the tradeoffs across educational goals that designers of the system are prepared to accept. Although any test will necessarily be incomplete, it should be designed to emphasize the most important learning goals in the subject domain and to measure students’ attainment of the goals through the use of various test item formats.
A test that asks very similar questions from year to year and uses a limited set of item formats will become predictable and encourage narrow teaching to the test. The test scores are likely to become distorted as a result, even if they were initially an excellent measure. To reduce the inclination for teachers to inappropriately teach to high-stakes tests, the tests themselves should be designed to sample the subject domain broadly and include continually changing content and item formats. And test items should be reused only rarely and unpredictably.
Performance targets should be challenging while also being attainable. Data should be used to determine attainable targets. Psychological research shows that unrealistically high goals undermine motivation. The ideal goals provide optimal challenge—ones that encourage people to stretch themselves and are attainable with effort.
The indicators used to summarize test results should match the goals of the test-based incentives policy, both in terms of the level of student achievement expected and the students or subgroups that are the focus of attention. Because any system of tests and indicators is necessarily incom-
plete, the system should be designed to emphasize the most important goals, and progress toward those goals should be measured in varied and diverse ways. Policy makers should recognize that goals that are not measured are likely to be deemphasized during instruction. Test-based incentive systems should be dynamic, responding to current goals as well as to indications of whether incentives are aligned to these goals in practice.
Given that tests are necessarily incomplete measures of valued educational goals, designers of incentive systems should recognize the potential problems inherent in having strong consequences based on test scores alone and should experiment with the use of systems of multiple measures that reflect desired outcomes. One way of incorporating multiple measures would be to use the results of large-scale tests as triggers for more focused evaluation of struggling schools and teachers, rather than as final evaluations on their own.
It is possible that the weak effects of the test-based incentive programs we reviewed may be due in part to the use of performance measures based primarily on tests that encourage narrow test preparation rather than broader instruction that can produce more general learning gains that are not tied to a particular test. We note, however, that the one program we reviewed that used multiple measures—the Teacher Advancement Program, which uses classroom observations in addition to test scores in evaluating teachers—produced a near-zero average effect with a number of negative effects in the upper grades. Again, this result underlines how much is still unknown about using test-based incentives effectively.
The nature of the support provided in conjunction with a test-based incentives system is also likely to prove important to success. If the capacity to bring about change is limited, successful implementation will require that the incentives system include provisions to promote the development of that capacity. In any system of incentives—whether focused on schools, teachers, or students—the people who are most in need of improvement and therefore usually the focus of the incentives are often specifically those who lack the capacity to bring about change on their own. The research to date does not suggest what kinds of support could be paired with test-based incentives to increase program effectiveness.
It is beyond the committee’s charge to suggest how to build capacity in school systems, but there is a growing literature on resources that are most useful in helping schools improve. Some of that work is brought together in two reports from the National Research Council, Engaging Schools: Fostering High School Students’ Motivation to Learn (National Research Council and Institute of Medicine, 2004) and America’s Lab Report: Investigations in High School Science (2006a). A recent report by the Center on Reinventing Public Education (Hill et al., 2008) suggests new approaches to finance, governance, and accountability that would foster
the kinds of competitive experimentation that could produce empirically grounded understandings of what works under what circumstances and for different groups.
Substantial research needs to be conducted in order to understand the effects of test-based incentives well enough for policies to be designed that will consistently result in meaningful educational improvement. The committee recognizes that it is difficult and time-consuming to conduct definitive—or even credible—studies of the effects of test-based incentives in educational settings. However, there is a strong initial body of work that can serve as a foundation. Chapter 4 provides examples of the kind of research that will be needed to identify successful ways of designing test-based incentive policies.
Recommendation 3: Research about the effects of incentive programs should fully document the structure of each program and should evaluate a broad range of outcomes. To avoid having their results determined by the score inflation that occurs in the high-stakes tests attached to the incentives, researchers should use low-stakes tests that do not mimic the high-stakes tests to evaluate how test-based incentives affect achievement. Other outcomes, such as later performance in education or work and dispositions related to education, are also important to study. To help explain why test-based incentives sometimes produce negative effects on achievement, researchers should collect data on changes in educational practice by the people who are affected by the incentives.
The committee offers priorities for rigorous research, presented as questions, in four areas: behavioral responses to incentives, validity of test score gains, incentive system outcomes, and incentive system improvements.
Behavioral Responses to Incentives
• What types of incentives do different types of performance measures and indicators create for educators and students?
• What is the range of effects—not just the average—of different types of incentives on teachers’ and students’ behavior and motivation?
• How does the complexity of an incentives system affect the ability of educators, parents, and students to understand the intended signals and respond to them?
Validity of Test Score Gains
• What is the relationship between the responses of teachers and others in the school system to test-based incentives and the validity of the gains in test scores? What measures of responses to accountability should be used to understand these relationships?
• What is the relationship between test-based incentives and external criteria, such as employment and wages? Are there relative wage and employment increases among the people for whom test scores rose?
• What characteristics of students, schools, and test-based incentives predict score inflation?
• What are some practical auditing methods, that is, cost-effective ways to monitor test score gains overall and at the school level?
Incentives System Outcomes
• What are the effects of test-based incentives on school and classroom practices? What changes occur in school policies, curriculum, instruction, and nonacademic activities, and are they consistent with community goals and priorities?
• What are the verifiable effects on student learning that can be attributed to the expectation of being accountable or to the subsequent use of data?
• How do test-based incentives affect the labor market for teachers, including recruitment, hiring, retention, placement, and mobility?
• How do stakeholders—students, parents, educators, policy makers, elected officials—affect the design and effects of test-based incentives?
Incentives System Improvements
• How can subjective measures of teaching practices be used to improve test-based incentives?
• How can large-scale tests be used as triggers to identify schools that need more focused, in-depth evaluation?
• What role should value-added analyses play in developing indicators for test-based incentives? What are the points of leverage in the education system for improvement? What are the policy and administrative levers for effecting change?
The charge to the committee pointed out the contradiction between many economists’ optimism and most psychologists’ pessimism about the potential for test-based incentives to alter academic performance. Our review of the literature and our deliberations did not resolve the contradiction. Our review of the evidence uncovered reasons to expect positive results from incentive programs and reasons to be skeptical of apparent gains. Our recommendations, accordingly, call for policy makers to support experimentation with rigorous evaluation and to allow midcourse correction of policies when evaluation suggests such correction is needed.
Our call for more research may seem like a hackneyed response, but we believe it is essential with regard to incentives. In calling for more evaluation, we draw attention to the fact that the frequent question, “Do incentives work?” is too broad and vague to be answerable. Most reforms using test-based incentives attempt to change student performance in many grades and many subjects. When ambitions are so broad, it is not surprising that the results are varied and unclear. Broad and major reforms do not succeed or fail all at once and altogether. Outcomes usually mix small successes and failures that add up to either modest improvements or disappointments. Our call for more focused evaluations is a call to examine the expected successes and failures. We call on researchers, policy makers, and educators to examine the evidence in detail and not to reduce it to a simple thumbs-up or thumbs-down verdict. The school reform effort will move forward to the extent that everyone, from policy makers to parents, learns from a thorough and balanced analysis of each success and each failure.