Before assessments become operational, it is important to ensure that detailed specifications have been met and that the resulting tests are indeed aligned with standards. Various methods for determining alignment have been developed, and all share a similar set of procedures. An independent group of experts, composed of teachers and subject matter experts, is convened and asked to examine each item or task, rate the content focus and level of cognitive demand of the items, and note any extraneous issues—such as language difficulty—that could affect a student’s ability to respond. Taking into account the number of items needed to meet minimum measurement criteria, results are then summarized to show the extent of coverage of the standards in question and the balance of coverage (Porter, 2002; Webb, 1997a, 1997b, 1999, 2001, 2002).
Because it is difficult to assess alignment if standards are not clearly articulated and focused, and because alignment studies make clear the limits of what can be assessed in a finite assessment, the results of alignment studies may indicate a need to modify the standards or to take other steps to improve the alignment between standards and assessments. Furthermore, because the committee advocates a system of assessments that supports student learning and development over time, alignment studies will need to address all of the assessments and sources of data that are intended to be part of the system, as well as addressing the alignment of assessments with learning expectations across grades. Methodologies will be needed to judge the alignment of a multilevel system.
Moreover, as states change their actual assessments, or portions of them, from year to year or within years, evidence must be collected to show the extent to which the different test forms are comparable and that the equating from one form to the next has been done correctly. Without this evidence, scores cannot be compared from one administration to the next, because any differences may be caused by differences in the difficulty levels of the two tests or the constructs measured, rather than changes in performance.
Like other aspects of test development, the plan for the reporting of test results requires monitoring, and methods of reporting should be field-tested with each intended audience—parents, administrators, and teachers—to ensure that reports are clear and comprehensible to users, that users are likely to interpret the information appropriately, and that the information is useful. Similarly, standard-setting processes should be monitored to ensure that appropriate stakeholders were included in the process, that the process took into account both empirical data on test performance and qualitative judgments of what kinds and levels of performance can be expected of minimally proficient students, and that there is evidence of the validity and accuracy of proficiency classifications based on the standards. Moreover, methodologies will be needed to ensure that performance standards take into account the results of a system of assessments, some of which are derived from statewide assessments, others from classroom assessments.