nected at best. Tensions are introduced when strong instructional programs and accountability assessments are at odds. Better aligning assessments, tying all assessments firmly to the theoretical and empirical knowledge base, are widely regarded as important to improving learning outcomes. In the area of mathematics, SERP affords a unique opportunity to pursue the development of an integrated assessment system with the three critical characteristics of comprehensiveness, coherence, and continuity described in Chapter 1 (National Research Council, 2001c). The construction of such a system is a major research, development, and implementation agenda that would require the stability, longevity, and support that SERP intends as its hallmark.

The work should be pursued as a collaborative effort involving teachers, content area specialists, cognitive scientists, and psychometricians. The effort could use as a departure point well-established standards in mathematics (e.g., National Council of Teachers of Mathematics), standards-based curricular resources, and rigorous research on content learning to identify and define what students should know in early mathematics, how they might be expected to show what they know, and how to appropriately interpret student performance. In the case of formative assessment, this extends to an understanding of the implications of what the evidence suggests for subsequent instruction. In the case of summative assessment, this means understanding the implications of student performance for mastery of core concepts and principles and its growth over time.

While there are several possible approaches to developing such a system of student assessments in early mathematics, one obvious place to begin is with a review of the assessment materials in existing widely used and exemplary curricular programs for formative and summative assessments, as well as state and national tests for policy making and accountability. These can be reviewed in light of cognitive theories of mathematical understanding, including empirical data regarding the validity of specific assessments. Research needs to focus on evidence of the effectiveness of specific assessments for capturing the range of student knowledge and proficiency for particular mathematical constructs and operations. A related line of inquiry should focus on issues of assessment scoring and reliability, particularly ease of scoring, consistency of scoring within

