Evaluation is an important feedback mechanism for the education system and must be an integral element in that system. The state must continually monitor and periodically evaluate the effectiveness of the education system as a whole, as well as the effects and effectiveness of each of its components—including the assessment system. The state will need to make sure not only that each component is functioning well independently, but also that the education system as a whole is operating as intended.

The chapter begins with an overview of professional and other standards for assessment quality, goes on to discuss the consequences and uses of assessment systems, and then looks at ways to incorporate evaluation throughout the assessment system.

EVALUATING THE TECHNICAL QUALITY OF ASSESSMENT INFORMATION

Any assessment system must, above all, provide accurate information. Users expect the information to be trustworthy and accurate and to provide a sound basis for actions. Validity, the term measurement experts use to express this essential quality of an assessment or an assessment system, refers to the extent to which an assessment’s results support meaningful inferences for intended purposes. The validity of such inferences rests on evidence that the assessment measures the constructs it was intended to measure and that the scores provide the information they were intended to provide. Thus, particular assessments cannot be classified as either valid or invalid in any absolute sense; it is the uses to which assessment results are put that are valid to a greater or lesser degree. An assessment that is valid for one purpose, such as providing a general indicator of tested students’ understanding of equilibrium, may be invalid for another purpose, such as providing details of students’ alternate conceptions about equilibrium that could be used to guide instruction. The same issues apply to the evaluation of assessment systems that produce a variety of information from multiple measures as apply to the use of multiple measures for assessing individuals, although the available methodologies have to be adapted for that purpose.

As discussed earlier, available evidence suggests that the science standards in many states are vague and not sufficiently specific to represent a clear target for assessment development or for curriculum and instruction (Cross, Rebarber, and Torres, 2004). However, the federal requirements do not ask states to revisit or refine their standards. The NCLB Peer Review Guidance (U.S. Department of Education, 2004) asks for evidence that states are improving the alignment of their assessments and standards over time and that they are filling gaps in their coverage of content domains. However, if a state’s standards are insufficiently clear for the purpose of determining with any degree of precision whether elements of the system are adequately aligned to them, or for the purpose of estab-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement