Finally, an assessment has instructional validity if the content matches what was actually taught. Questions concerning these different forms of validity need to be addressed independently, although they are often related. Messick (1989) offers another perspective on validity. His definition begins with an examination of the uses of an assessment and from there derives the technical requirements. Validity, as he defines it, is “an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” [italics added] (p. 13). Thus, validity in his view is a property of consequences and use rather than of the actual assessment. Messick 's (1994) use of validity stresses the importance of weighing social consequences: “Test validity and social values are intertwined and that evaluation of intended and unintended consequences of any testing is integral to the validations of test, interpretation and use” (p. 19). Validity, he argued, needs evidentiary grounding, including evidence of what happens as a result. Moss (1996) urges that actions taken based on interpretation of assessment data and that consequences of those actions be considered as evidence to warrant validity.

Attention to issues of validity is important in the type of ongoing classroom assessment discussed thus far in this chapter. It is important to keep in mind the guideline that says that assessments should match purpose. When gathering data, teachers and students need to consider if the information accurately represents what they wish to summarize, corresponds with subject matter taught, and reflects any unintended social consequences that result from the assessment. Invalid formative assessment can lead to the wrong corrective action, or to neglect action where it is needed. Issues relating to validity are discussed further in Chapter 4.

Reliability refers to generalizability across tasks. Usually, it is a necessary but not complete requirement for validity. Moss (1996) makes a case that reliability is not a necessity for classroom assessment. She argues for the value of classroom teachers' special contextualized knowledge and the integrative interpretations they can make. Research literature acknowledges that over time, in the context of numerous performances, concerns of replicability and generalizability become less of an issue (Linn & Burton, 1994; Messick, 1994). Messick states that dropping reliability as a prerequisite for validity may be “feasible in assessment for instructional improvement occurring frequently throughout the course of teaching or in appraisals of extensive portfolios ” (p. 15).

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement