the reading and mathematics tests were set at what were perceived to be relatively reasonable levels (2.8 for third graders, 5.3 for sixth graders, and 7.2 for eighth graders) with the intention of raising them later (interview with Joe Hahn, director, Research, Assessment, and Quality Review, Chicago school district).11

"We decided to be credible to the public," Chicago Chief Accountability Officer Philip Hansen told the committee. "If that 3rd grader doesn't get a 2.8 in reading, the public and the press and everyone understands clearly, more clearly than educators, that, gee, that's a problem, so they can see why that child needs to be given extra help …. Our problem comes with explaining it to educators as to why we don't use other indicators."

Educational Outcomes

As mentioned earlier, tests used for student promotion are usually thought to measure mastery of material already taught, but a promotion test may also be interpreted as indicating a student's readiness for the next grade. In the latter case, it would be relevant to obtain evidence that there is a relationship between the test score and certain pertinent criterion measures at the next grade level.12

For example, in the case of a reading test, it may be useful to demonstrate that there is a relationship between students' scores on the promotion test and their scores on a reading test taken at the end of the next school year. Such evidence of predictive validity would, however, not usually be enough to justify use of the test for making promotion decisions. Additional evidence would be needed that the alternative treatments (i.e., promotion, retention in grade, or some other intervention)


These cutscores are strictly adhered to; failure to reach them results in mandatory summer school. Upon completion of the six-week summer program, students may retake the test in a different form. If they reach the cutscore, they go on to the next grade. Some students who come close to the cutoff (e.g., passing in one subject and coming very close in another) may be retested in January (Chicago Public Schools, 1997a).


Evidence about relations to other variables can also be used to investigate questions of differential prediction for groups. A finding that the relation of test scores to a relevant criterion variable differs from one group to another may imply that the meaning of the scores is not the same for members of the different groups, perhaps due to construct underrepresentation or construct-irrelevant variables (American Educational Research Association et al., 1985, 1998).

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement