parison to what is considered a reliable measure of that performance. The benchmark might be the ratings on one or more indepth interviews that thoroughly explore and rate students’ comprehension of a text. Questions of practicality must also be investigated. Measuring comprehension could be made more reliable if testing time and scoring time were not constraints. However both are valuable resources, and balancing quality and practicality will require attention. Box 2.6 gives an example from the recently revised SAT. It provides a measure of deep comprehension in the very practical (for scoring purposes) multiple-choice format.
But perhaps most importantly for improving educational outcomes, the instructional validity of the assessment must be investigated: Does it provide information that can be used to productively shape an understanding of the student’s instructional needs? Can it help guide the teacher’s instructional decisions? The SAT question posed in Box 2.6 provides insight into the student’s ability to understand the literary use of a word in context that requires a fairly sophisticated understanding. But
In its recent revision of the SAT, the College Board includes the item below in the verbal section:
Dinosaurs have such a powerful grip on the public consciousness that it is easy to forget just how recently scientists became aware of them. A 2-year-old child today may be able to rattle off three dinosaur names, but in 1824, there was only one known dinosaur. Period. The word “dinosaur” didn’t even exist in 1841. Indeed, in those early years, the world was baffled by the discovery of these absurdly enormous reptiles.
The statement “Period” in the middle of the paragraph primarily serves to emphasize the:
SOURCE: Education Week (2002).