As noted in Chapter 6, validity research is a vital component of any high-quality assessment program. Validity involves what an examination measures and the meaning that can be drawn from the scores and the actions that follow (AERA/APA/NCME, 1999; Cronbach, 1971). For each purpose for which the scores are used, there must be evidence to support the inferences drawn. In the case of AP, for example, the College Board wants users to draw the inference that students’ performance on AP examinations is indicative of their mastery of material taught in a typical college course in the subject. The IBO wants users to draw the inference that students who earn an IB Diploma through their performance on six IB examinations3 are adequately prepared for postsecondary work in many countries.
“The process of validation involves accumulating evidence to provide a sound scientific basis for the proposed score interpretations” (AERA/APA/ NCME, 1999, p. 9). The AP Technical Manual describes two common interpretations of AP scores: (1) a good AP grade indicates that the student would benefit from entering a course more advanced than the usual first-year course, and (2) an AP grade indicates that the student should receive credit for a college course that he or she has not taken.4 The IBO does not describe the appropriate interpretations of IB grades other than to say that they reflect students’ mastery of course content that is designed to prepare them for postsecondary learning.
Given these desired interpretations, validation studies for the AP and IB assessments should include systematic evaluation of such factors as whether the right skills and knowledge are being measured and in the right balance; whether the cognitive processes required by the test are representative of the ways knowledge is used in the discipline; the extent to which the test measures students’ knowledge of the broader construct that is the target of instruction, as opposed to their knowledge of specific test items; whether the scoring guidelines focus on student understanding; and whether the test scores accurately represent different levels and kinds of understanding.
The committee’s analyses of the test items and the course syllabi on which the tests are based yielded information about content coverage. However, no data were available for evaluating whether the tests measure important cognitive skills. This is because neither program has systematically gathered data to document that test items on its examinations measure the skills they purport to measure. In making determinations about the validity of