sors and the classroom functions of assessment, including improving classroom practice, planning curricula, developing self-directed learners, reporting student progress, and researching teaching practices (National Research Council, 1996).
Responding to the escalating use of assessment for accountability purposes and concerned about the validity of the systems being created, researchers at the Center for Research on Evaluation, Standards, and Student Testing (CRESST), a consortium of university-based experts in educational measurement, have advanced the idea of standards for accountability systems (Baker, Linn, Herman, and Koretz, 2002), specifically advocating that attention be paid to the system as a whole and not just to individual assessments. Drawing on the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999) as well as their own knowledge and experience and ethical considerations, the developers of the CRESST standards stress that accountability systems should be evaluated on the basis of multiple forms of evidence. Specifically, systems should be supported by rich and varied evidence of the validity of inferences based on assessment results, evidence that all elements of the system are aligned, and evidence that assessment is sensitive to instruction (that is, that good instruction yields higher performance on the assessment than does poor instruction). Standards are presented in five areas—system components, testing standards, stakes, public reporting, and evaluation—and dimensions against which accountability systems could be evaluated are provided for each. With regard to evaluation, the CRESST standards propose that (Baker et al., 2002, p. 4):
Longitudinal studies should be planned, implemented, and reported evaluating effects of the accountability program. Minimally, questions should determine the degree to which the system:
builds capacity of staff;
affects resource allocation;
supports high-quality instruction;
promotes student-equitable access to education;
affects teacher quality, recruitment, and retention; and
produces unanticipated outcomes.
The validity of test-based inferences should be subject to ongoing evaluation. In particular, evaluation should address:
aggregate gains in performance over time; and
impact on identifiable student and personnel groups.