Box 8-2 Test Score Flagging
Flagging is a concern when a nonstandard administration of an assessment—for example, providing accommodations such as extra time or a reader—may have compromised the validity of inferences based on the student's score. Flagging warns the user that the meaning of the score is uncertain.
Flagged scores are typically not accompanied by any descriptive detail about the individual, or even the nature of accommodations offered. Therefore, flagging may not really help users to interpret scores more appropriately. It does, however, confront them with a decision: Should the score be ignored or discounted because of the possibility that accommodations have created unknown distortions? In the case of scores reported for individual students, flagging identifies the individual as having a disability, raising concerns about confidentiality and possible stigma.
When testing technology is able to ensure that accommodations do not confound the measurement of underlying constructs, score notations will be unnecessary. Until then, however, flagging should be used only with the understanding that the need to protect the public and policymakers from misleading information must be weighed against the equally important need to protect student confidentiality and prevent discriminatory uses of testing information.
SOURCE: National Research Council (1997).
disabilities, is the way in which performance levels are set. A number of new large-scale assessments typically use only a few performance levels, wherein the lowest level is high relative to the average distribution of performance. Consequently, very little information is provided about modest gains by the lowest-performing students, including some students with disabilities. This kind of reporting rubric may also signal that modest improvements are not important unless they bring students above the performance standard. To enable participation of students with disabilities, high-stakes tests should represent performance accurately at all points across a rather broad continuum. This not only implies breadth in terms of difficulty and the content assessed, but also requires that reporting methods provide sufficient and adequate information about all levels of student performance.
New assessment systems are relying heavily on performance assessments, which may decrease the reliability of information about low-achieving