Secondary Education Act enacted in 1965 mandated achievement testing as a primary mechanism for monitoring and evaluating the new federal compensatory education program, Title I. Because Title I services are provided in the great majority of elementary schools, this requirement had a major influence on testing throughout the K-12 education system (Airasian, 1987; Roeber, 1988). Another important step in the transformation of testing was the establishment of the National Assessment of Educational Progress (NAEP) in the late 1960s as an ongoing program of testing to monitor achievement of our nation's youth.
Another stage in the evolution of testing was the minimum competency testing (MCT) movement in the 1970s (Jaeger, 1982). MCT programs were intended to not only measure performance but also spur its improvement through the mechanism of high stakes for students. Indeed, some of its proponents called MCT "measurement-driven instruction" (e.g., Popham et al., 1985). MCT programs relied on criterion-referenced rather than norm-referenced tests. That is, they used tests that were designed to determine whether students had reached a predetermined standard of achievement rather than to place students' performance on a distribution of performance, such as a national distribution for students in a given grade. As the term "minimum competency" suggests, the standards were low, designed only to identify students who failed to reach a standard judged to be minimally acceptable. In many states MCTs were used as an "exit exam" for high school graduation. A smaller number of states used MCTs to set ''promotional gates," governing promotion between certain grades (Jaeger, 1982).
In the 1980s test-based accountability received another boost with the "reform movement" that followed the publication of A Nation at Risk (National Commission on Excellence in Education, 1983). The reforms of the 1980s varied from state to state, but one of the most common elements was greater reliance on testing as a policy tool. Pipho (1985) noted that "nearly every large education reform effort of the past few years has either mandated a new form of testing or expanded [the] uses of existing testing." Ambach (1987) asserted that the nation had entered a period of not only measurement-driven instruction but also "measurement-driven educational policy." Much of the new testing had high stakes, but the nature of the consequences began to change, shifting from stakes for students toward evaluations of educators or systems (Koretz, 1992).
The testing of the 1980s reform movement fell into disfavor surprisingly soon. Confidence in the reforms was so high at the outset that few programs were evaluated realistically. By the end of the decade, however, confidence in the reforms was supplanted by widespread suspicion that they had often degraded instruction and inflated test scores by inappropriate teaching to the test. Some of the evidence relevant to those negative conclusions is described below.
Despite increasing skepticism about the effects of the programs of the 1980s, few reformers questioned the basic premise that test-based accountability could be the primary impetus for better education. Rather, a growing number of reformers called for a "second wave" of programs that would continue heavy