educational equity, by creating a receptive political climate and perhaps by providing judicially manageable standards that courts had a difficult time identifying in the past.

Nevertheless, the tortuous path to standards-based reform is also a cautionary tale about how long and difficult the process can be. As a recent National Research Council committee pointed out, it is not yet clear whether the guiding assumptions of standards-based reform are correct or that policies built on them will have their desired effect (National Research Council, 1997:33–46). The rhetoric that "all students can learn to high standards" leaves unresolved important philosophical and logistical issues, such as balancing high, uniform standards with students' unique educational needs and abilities. The extent to which consensus can be reached on curriculum and performance standards is unclear. Controversies over the specification of outcomes (e.g., in Pennsylvania), the content of curriculum frameworks and performance assessments (e.g., in California), and the content of voluntary subject-matter standards (e.g., the lopsided U.S. Senate vote condemning voluntary national standards in history) "suggest that consensus dissolves once the public moves beyond a general belief in the need for standards and assessments to questions about what those standards should be and how students should be taught and tested" (National Research Council, 1997:38). Moreover, major uncertainties remain about whether student performance can be measured validly and reliably and whether instruction consistent with the standards can be implemented in individual schools and classrooms.

The technical challenges in measuring student performance become increasingly important as the concept of adequacy shifts the focus of attention to the outcomes of education. A key issue is whether existing tests define and measure achievement in ways consistent with standards-based reform or other statements of desired educational outcomes. Large-scale, standardized tests8 are tools for determining what students know and can do in specified domains. No large-scale assessment measures all aspects of student achievement. Moreover, many tests currently in use do not capture critical differences in students' levels of understanding, do not adequately reflect higher-order thinking skills called for by many new education standards, and do not reflect more comprehensive goals for student achievement that go beyond subject-matter knowledge to other valued skills and abilities. For example, a recent National Research Council (NRC)


 "Large-scale" tests are those administered to students from many schools. "Standardized" tests are similar tests given to many students under uniform conditions. This latter point is often misunderstood; it is common for people to assume that standardized tests must use a multiple-choice format. In fact, ''even a written examination, one that is scored by teachers or other human judges and not by machine, is considered standardized if all students respond to the same (or nearly the same) questions and take the examination under similar conditions" (National Research Council, 1999b:29).

