As noted above, the CRESST accountability standards (Baker et al., 2002) highlight the need for longitudinal studies to examine the effects of any accountability system. If the primary purpose of NCLB science assessments is to improve student achievement overall and to close the achievement gap between high- and low-achieving students, then studies should examine the extent to which the intended benefits are realized. The CRESST researchers suggest that among the intended benefits that should be investigated are the extent to which the system does the following:
builds the capacity of staff to enable students to reach standards;
builds teacher assessment capacity;
influences the way resources are allocated to ensure that students will achieve standards;
supports high-quality instruction aligned with standards; and
supports equity in students’ access to quality education.
The accountability standards also note potential unintended consequences that should be investigated. These include the possibility of corruption of test scores; adverse effects on teacher quality, recruitment, or retention; and increases in dropout rates. All these unanticipated outcomes have been associated with high-stakes assessments (Klein, Hamilton, McCaffrey, and Stecher, 2000; Madaus, 1998).
The feasibility of the assessment system also merits inquiry. For example, an assessment program may place new burdens on teachers, principals, and districts. It may raise questions about opportunity costs, cost-effectiveness, and the feasibility of performance targets. Thus, evaluation of the feasibility of any targets set for school performance and progress must be part of the process. For example, Linn (2003) uses historical data to suggest that current goals for adequate yearly progress in reading and mathematics represent a level of improvement that is well beyond what the most successful schools have actually achieved.
As noted earlier, when new high-stakes state assessments are put into place, scores typically show an increase over the first several years. But as Koretz (2005) has noted, such gains may be spurious. One way to examine the extent to which gains in test scores represent real improvements in learning—rather than effective test preparation—is to compare the gains shown on the high-stakes test with those shown on other, independent measures of the same or similar construct.
Another study shows the importance of one of the CRESST evaluation recommendations—that the impact of accountability and assessment on subgroups of the student population be monitored (Klein et al., 2000). Reducing the achievement gap in science between historically underachieving minorities and their more privileged peers is an explicit purpose of NCLB. Just as the law requires that