on professional guidelines and their own work, these individuals have suggested criteria for evaluating teacher licensure tests. The committee used the three sets of published guidelines and the six commissioned papers to develop a framework for evaluating teacher licensure tests. This framework, which relies heavily on the Crocker paper, suggests criteria for test development and evaluation. The framework includes criteria for stating the purposes of testing; deciding on the competencies to test; developing the test; field testing and analyzing results of the test; administering and scoring tests; protecting tests from corruptibility; setting standards; attending to reliability and related issues; reporting scores and providing documentation; conducting validation studies; determining feasibility and costs; and studying the long-term consequences of the broader licensure program. These criteria are discussed below after a discussion of validity, which is an overriding concern in all evaluations of tests.
The 1999 standards say that “validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (American Educational Research Association, et al., 1999:9) and that the primary purpose of licensure testing is “to ensure that those licensed possess knowledge and skills in sufficient degree to perform important occupational activities safely and effectively” (pg. 156). The standards explain that the type of evidence needed to establish a test’s validity is a matter of professional judgment: “Professional judgment guides decisions regarding the specific forms of evidence that can best support the intended interpretation and use” of test scores (pg. 11).
The 1999 standards note that at the present time validity research on licensure tests focuses “mainly on content-related evidence, often in the form of judgments that the test adequately represents the content domain of the occupation” (pg. 157). Typically, validity evidence for employment and credentialing tests includes a clear definition of the occupation or specialty, a clear and defensible delineation of the nature and requirements of the job, and expert judgments on the fit between test content and the job’s requirements. Procedurally, test sponsors conduct job analyses to define occupations and develop test specifications (blueprints) for licensure tests. These are studies of the knowledge, skills, abilities, and dispositions needed to perform job duties and tasks. Studies of content relevance are then conducted to determine whether the knowledge and skills examined by the tests are relevant to the job and are represented in the test specifications. These data are generally obtained by having subject matter experts rate items on how well they reflect the test specifications, testing objectives, and responsibilities of the job (Impara, 1995; Smith and Hambleton, 1990; Sireci, 1998; Sireci and Green, 2000). Typically, sensitivity reviews also are conducted to determine if irrelevant characteristics of test questions or test forms are likely to provide unfair advantages or disadvantages to particular groups of