results be disaggregated by subgroup, so, too, should studies of the effects of testing look for differential effects on population subgroups. Such effects may suggest different conclusions than those that result from looking only at overall aggregate performance. It is thus important to look not only at multiyear trends in performance overall and by subgroup but also to examine students’ longitudinal growth using advanced statistical models and individual-level data. For example, Choi, Seltzer, Herman, and Yamachiro (2004) found that schools with similar overall growth patterns could be differentially effective with students of differing initial ability. In some schools the gap between high-ability and low-ability students could be increasing, while in others with similar overall growth the pattern could be reversed.
An additional concern is the utility and use of assessment results. A primary purpose of state assessment systems is to provide evidence that will improve decision making and enable states, districts, and schools to better understand and improve science learning. Stakeholders at each level of the educational hierarchy—state departments of education, school districts, schools, and classrooms—need to monitor student performance and take appropriate action to improve it. For example, a district or state may observe trends in student performance in biology, discover that students are performing relatively poorly with particular science concepts, and use these data to institute a new professional program for teachers that develops their capacity to teach and assess understanding of key biology concepts. At the classroom level, a teacher using a classroom assessment to get detailed knowledge of students’ understanding of a particular concept, such as buoyancy, can use that information to provide immediate feedback to students, recognizing the need to engage students in additional lab work to over-come their misconceptions. Thus, the consequences and uses of the assessment system at each level need to be evaluated. This analysis should include questions about whether and how the data are actually used, with an eye to both intended and unintended consequences. Surveys, focus groups, observations, and the collection of artifacts are all means of acquiring this kind of information.
This report outlines ambitious goals for assessment systems that go beyond current practice in supporting both accountability and student learning, although we recognize that experience with the design requirements of effective standards-based systems is still developing. For example, the committee has stressed, and NCLB requires, that the elements of an assessment and accountability system should be both coherent and aligned with standards. However, the methodology for developing and ensuring such alignment is still evolving, and there is only a