It is also essential to recognize that the domains that current assessments are designed to test are themselves only a subset of the desired outcomes of schooling. Often, the argument that measured achievement is only one of the goals of schooling is interpreted as avoidance of hard-nosed accountability, and it may often be. Nonetheless, an accountability system that fails to address outcomes beyond those typically tested is likely to be insufficient. Much of what many individuals need to learn will arise after they leave public schools—in later education, in the workplace, and in civic life. The extent to which they are successful in this later learning may depend in substantial part on the body of knowledge and skills that students have at graduation, much of which can be tested. But it is also likely to depend on attitudes and habits that are not typically measured by achievement tests—an attitude that mathematical problems are interesting and tractable, for example, or an interest in and willingness to weigh carefully conflicting evidence and competing positions underlying political arguments. Thus, an accountability system that produces high scores on tests at the price of poor performance on unmeasured outcomes may be a poor bargain (Haney and Raczek, 1994).

Typical Test Databases

To understand the uses and limits of test data it is also imperative to consider the types of databases in which assessment data are typically embedded. Three attributes of these databases are particularly important.

First, large-scale assessment data are usually cross-sectional. Some districts and states can track the progress of students who remain in the jurisdiction longitudinally (see, e.g., Clotfelter and Ladd, 1995), but few assessment programs are designed to do so. More typical are systems like NAEP and the assessment programs in Kentucky and Maryland, in which students in various grades are tested in a variety of subjects but scores are not linked across grades. In many instances these cross-sections are limited to a few grades. NAEP usually tests only in grades 4, 8, and 12 (in some instances, grade 11 rather than 12); Kentucky limits most parts of its accountability-oriented testing to grades 4, 8, and 11; Maryland's performance assessment program is administered in grades 3, 5, and 8. Cross-sectional data are, of course, very poorly suited to the measurement of value-added and afford less opportunity to take into account statistically the noneducational determinants of achievement such as family background.

Some systems, such as the Kentucky accountability program, use repeated cross-sections only to measure changes in schools' scores, thus removing some of the confounding between the effects of schooling and the effects of students' backgrounds. A thorough examination of this approach is beyond the scope of this chapter, but it is important to note that it has serious limitations. One is simple imprecision: test scores provide only an error-prone estimate of a school's performance for a given year because of the limited information provided by the



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement