proposal asserts that effective management requires that a system take into account the many sources of educational performance, some of which are not the responsibility of the school, and maintains that schools should therefore be held accountable only for value-added.
To evaluate proposals of this sort and to decide how best to translate them into practice, it is essential to examine both the logic of achievement testing and the evidence pertaining to assessment-based accountability. At first, the logic seems simple and compelling: student achievement is the primary goal of education, and holding educators accountable for the amount of learning they induce can only focus and intensify their efforts. In practice, however, assessment-based accountability poses serious difficulties.
Despite the long history of assessment-based accountability, hard evidence about its effects is surprisingly sparse, and the little evidence that is available is not encouraging. There is evidence that effects are diverse, vary from one type of testing program to another, and may be both positive and negative. The large positive effects assumed by advocates, however, are often not substantiated by hard evidence, and closer scrutiny has shown that test-based accountability can generate spurious gains—thus creating illusory accountability and distorting program effectiveness—and degrade instruction. One source of these problems is limitations of tests themselves, and a primary emphasis in the current reform movement is on the development of innovative, less limited assessments. A second source is the structure of the data in which test scores are typically embedded; assessment databases are rarely of a form that would permit accurate measurement of value-added or of program effectiveness. A third source is behavioral responses to accountability: holding educators accountable for students' test scores can create undesirable practices that inflate scores and may undermine learning.
This chapter sketches the recent history of assessment-based accountability and then describes some of the most important problems it entails. The final sections address the potential of innovations in testing and suggest some implications for policy. I do not wish to discourage the use of tests in accountability systems but rather want to encourage reformers to use tests in ways that take their problems into account and that are therefore more likely to improve student learning.
Between World War II and the 1960s, achievement testing in the United States was "low stakes," without serious consequences for most students or teachers. Some tests did have serious consequences—for example, college admissions tests and tests used to place students in special education—but they were the exception rather than the rule (Goslin, 1963, 1967; Goslin et al., 1965).
The functions of testing began to change in the 1960s. The Elementary and