systems that contain information on student test scores and student, family, and community characteristics.

Annual testing at each grade level is desirable for at least three reasons. First, it maximizes accountability by localizing school performance to the most natural unit of accountability—the grade level or classroom. Second, it yields up-to-date information on school performance. Finally, it limits the amount of data lost because of student mobility. As the time interval between tests increases, these problems become more acute. Indeed, for time intervals of more than two years, it may be impossible to construct valid and reliable value-added indicators for schools with high mobility rates. Mobile students generally must be excluded from the data used to construct value-added and gain indicators, since both indicators require pre- and posttest data. In schools with high student mobility, infrequent testing diminishes the prospect of ending up with student data that are both representative of the school population as a whole and large enough to yield statistically reliable estimates of school performance.28 Less frequent testing—for example, in kindergarten29 and grades 4, 8, and 12—might be acceptable for national purposes, since student mobility is less prevalent at the national level; but to evaluate local school performance, frequent testing is highly desirable.

Conclusions and Recommendations

Average test scores, one of the most commonly used indicators in American education, are an unreliable indicator of school performance. Average test scores fail to localize school performance to the classroom or grade level, aggregate information on school performance that tends to be grossly out of date, are contaminated by student mobility, and fail to distinguish the distinct value-added contribution of schools to growth in student achievement from the contributions of student, family, and community factors. Average test scores are a weak, if not counterproductive, instrument of public accountability.

The value-added indicator is a conceptually appropriate indicator for measuring school performance. This chapter presents two basic types of value-added indicators: the total school performance indicator, which is appropriate for purposes of school choice, and the intrinsic performance indicator, which is appropriate for purposes of school accountability. The quality of these indicators is determined by the frequency with which students are tested, the quality and appropriateness of the tests, the adequacy of the control variables included in the


In schools with extremely high rates of student mobility, it might be necessary to test students more than once a year.


A kindergarten test is needed so that growth in student achievement in grades 1 through 4 can be monitored. The NAEP and recent proposals for national testing in grades 4, 8, and 12 are seriously flawed by their failure to include a test at the kindergarten or first-grade level.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement