Although periodic testing is a critical part of any education reform, some of the movement toward increased testing may be fueled by a misguided assumption that more frequent testing, in and of itself, will improve education. At the same time, criticism of test policies may be predicated on an equally misguided assumption that testing, in and of itself, is responsible for most of the problems in education. A more realistic view is to address education problems not by stepping up the amount of testing or abandoning assessments entirely, but rather by refashioning assessments to meet current and future needs for quality information. However, it must be recognized that even very well-designed assessments cannot by themselves improve learning. Improvements in learning will depend on how well assessment, curriculum, and instruction are aligned and reinforce a common set of learning goals, and on whether instruction shifts in response to the information gained from assessments.
With so much depending on large-scale assessment results, it is more crucial than ever that the scores be reliable in a technical sense and that the inferences drawn from the results be valid and fair. It is just as important, however, that the assessments actually measure the kinds of competencies students need to develop to keep pace with the societal, economic, and technological changes discussed above, and that they promote the kinds of teaching and learning that effectively build those competencies. By these criteria, the heavy demands placed on many current assessments generally exceed their capabilities.
Current assessment practices are the cumulative product of theories of learning and models of measurement that were developed to fulfill the social and educational needs of a different time. This evolutionary process is described in more detail in Chapters 3 and 4. As Mislevy (1993, p. 19) has noted, “It is only a slight exaggeration to describe the test theory that dominates educational measurement today as the application of 20th century statistics to 19th century psychology.” Although the core concepts of prior theories and models are still useful for certain purposes, they need to be augmented or supplanted to deal with newer assessment needs.
Early standardized tests were developed at a time when enrollments in public schools were burgeoning, and administrators sought tools to help them educate the rapidly growing student populations more efficiently. As described in Testing in American Schools (U.S. Congress, Office of Technology Assessment, 1992), the first reported standardized written achievement exam was administered in Massachusetts in the mid-19th century and intended to serve two purposes: to enable external authorities to monitor school systems and to make it possible to classify children in pursuit of more efficient learning. Thus it was believed that the same tests used to monitor