practice (Black, 1997; Frederiksen, 1984; Smith & Rottenberg, 1991). Teachers are left facing serious dilemmas.
The foundations for a standards-based summative assessment system are assessments that are systemically valid: aligned to the recommendations of the national standards, grounded in the educational system, and congruent with the educational goals for students. Alignment of assessment to curriculum and standards ensures that the assessments match the learning goals embodied in the standards and enables the students, parents, teachers and the public to determine student progress toward the standards (NRC, 1999b).
Assessment and accountability systems cannot be isolated from their purpose: to improve the quality of instruction and ultimately the learning of students (NRC, 1999b). They also must be well understood by the interested parties and based on standards acceptable to all (Stecher & Herman, 1997).
An effective system will provide students with the opportunity to demonstrate their understanding and skills in a variety of ways and formats. The form the assessment takes must follow its purpose. Multiple-choice tests are easy to grade and can quickly assess some forms of science-content knowledge. Other areas may be better tapped through open-ended questions or performance-based assessments, where students demonstrate their abilities and understandings such as with an actual hands-on investigation (Shavelson & Ruiz-Primo, 1999). Assessing inquiry skills may require extended investigations and can be documented through portfolios of work as it unfolds.
Educators need to be cautious, deliberate, and aware of the strong influence of high-stakes, external tests on classroom practice specific to the instruction emphasis and its assessment (Frederiksen, 1984; Gifford & O'Connor, 1992; Goodlad, 1984; Popham, 1992; Resnick & Resnick, 1991; Rothman, 1995; Shepard, 1995; Smith et al., 1992; Wolf et al., 1991) when considering, implementing, and evaluating large-scale assessment systems. No assessment form is immune from negative influences. Messick (1994) concludes
It is not just that some aspects of multiple-choice testing may have adverse consequences for teaching and learning, but that some aspects of all testing, even performance testing, may have adverse as well as beneficial educational consequences. And if both positive and negative aspects, whether intended or unintended, are not meaningfully addressed in the validation process, then the concept of validity loses its force as a social value. (p. 22)