tives and accountability. This should include exploration of the effects of key features suggested by basic research, such as who is targeted for incentives; what performance measures are used; what consequences are attached to the performance measures and how frequently they are used; what additional support and options are provided to schools, teachers, and students in their efforts to improve; and how incentives are framed and communicated. Choices among the options for some or all of these features are likely to be critical in determining which—if any—incentive programs are successful.
In general, the design of test-based incentives should begin with a clear description and delineation of the most valued educational goals that the incentive program is meant to promote, as well as recognition of the tradeoffs among these goals. Those goals should shape the features of the incentive program, even though experience shows that the effects of a program may not always occur in the ways intended.
The performance measures used in an incentive system are likely to be critical. The tests and indicators used for performance measures should be designed to reflect the most valued educational goals, and their relative weights in the incentive system should reflect the tradeoffs across educational goals that designers of the system are prepared to accept. Although any test will necessarily be incomplete, it should be designed to emphasize the most important learning goals in the subject domain and to measure students’ attainment of the goals through the use of various test item formats.
A test that asks very similar questions from year to year and uses a limited set of item formats will become predictable and encourage narrow teaching to the test. The test scores are likely to become distorted as a result, even if they were initially an excellent measure. To reduce the inclination for teachers to inappropriately teach to high-stakes tests, the tests themselves should be designed to sample the subject domain broadly and include continually changing content and item formats. And test items should be reused only rarely and unpredictably.
Performance targets should be challenging while also being attainable. Data should be used to determine attainable targets. Psychological research shows that unrealistically high goals undermine motivation. The ideal goals provide optimal challenge—ones that encourage people to stretch themselves and are attainable with effort.
The indicators used to summarize test results should match the goals of the test-based incentives policy, both in terms of the level of student achievement expected and the students or subgroups that are the focus of attention. Because any system of tests and indicators is necessarily incom-