The Development and Quality of Licensure Tests
The Standards for Educational and Psychological Testing (American Educational Research Association et al., 1985, 1999), the Principles for the Validation and Use of Personnel Selection Procedures: Third Edition (Society for Industrial and Organizational Psychology, 1987), and the Uniform Guidelines for Employee Selection Procedures (U.S. Equal Employment Opportunity Commission et al., 29 C.F.R. 1607, 1978 ed.), provide guidelines for developing educational, psychological, and employment tests and for gathering validity evidence about their uses. They outline criteria for evaluating tests and testing practices.
TEST DEVELOPMENT
As noted above, licensure tests are designed to distinguish between candidates who meet minimum professional standards and those who do not. Developers of basic skills and content knowledge tests often begin the design process by conducting analyses to determine the knowledge and skills that beginning teachers need to demonstrate before they should be allowed to practice without direct supervision (Pearlman, 1999). These analyses rely on data that ranges from informed judgments to formal surveys.
For basic skills tests, rudimentary literacy and mathematics skills are identified. For subject-matter and pedagogical knowledge tests, the analyses draw on national disciplinary standards, such as the mathematics standards developed by the National Council of Teachers of Mathematics (National Council of Teachers of Mathematics, 1989) and science education standards developed by the National Research Council (National Research Council, 1996); state standards for
students and teachers; and the research literature. For some subject-matter and pedagogical knowledge tests, the knowledge and skill listings go out for public review and comment before test content is defined; in others, developers survey teachers to determine which of these competencies are important (Porter et al., in press). Using this information, test developers then construct test specifications that describe the content of teacher licensure tests and the ways the content will be assessed. Though there are commonalties, test specifications differ from state to state in accordance with state judgments about the knowledge and skills needed for beginning teachers.
Test developers, often with the assistance of practitioners, write questions that meet the specifications. These questions typically are then reviewed for accuracy, clarity, and fairness. In many cases, field trials of questions are conducted before the final tests are constructed.
Once test are built, developers and state licensing officials set passing standards on them. Generally, teachers are asked to estimate the level of performance on the test minimally qualified candidates would be expected to achieve. Often, panels of educators are asked to judge whether adequately prepared beginning teachers would answer particular questions correctly; these question-by-question judgments are compiled to derive a recommended overall passing score. Alternatively, teachers examine entire test booklets to estimate the lowest score a candidate could earn and still be considered minimally qualified.
The final determination of passing scores usually is made by each state's board of education, based on the panels' recommendations and other information, such as the estimated effect of different passing scores on passing rates and the number of licensed teachers. As noted above, even when the same test is used, passing scores vary by state, depending, in part, on differing views of minimum standards for teachers.
The test development and standard-setting procedures described here are generally consistent with professional guidelines and are used by the Educational Testing Service and several state licensing agencies. Other developers and states have taken different approaches to constructing teacher licensure tests and setting passing scores. In addition, there are differences in the composition and backgrounds of question writers and reviewers, and in the makeup of standard-setting panels. For some tests, public documentation is insufficient to judge the quality of test development efforts. Some tests have been criticized for failing to adhere to professional test development guidelines, but the committee has not reviewed the validity of these criticisms.
VALIDITY EVIDENCE
In addition to providing information on test development procedures, test developers are also expected to provide evidence of the validity of test score interpretation. Most of the validity evidence currently available for teacher licen-
sure tests is based on judgments about whether the test is likely to assess the knowledge and skills it was intended to measure and whether such knowledge and skills are necessary for beginning teachers to possess (Educational Testing Service, 1999; Mehrens, 1990; Popham, 1990). For basic skills tests, this evidence is based on judgments about the literacy and mathematics abilities beginning teachers should demonstrate. That is, state panels describe the reading, writing, and mathematics skills all teachers should have, and they judge whether the tests are likely to measure such knowledge. For subject-matter and pedagogical tests, the validity evidence rests on judgments about what beginning teachers should know about curriculum and instruction and whether given test items cover that information.
Developers and state sponsors often collect this evidence by convening panels of educators to make judgments about whether the knowledge and skills the items appear to measure match the test specifications and whether the knowledge is important for entry-level teachers to demonstrate. In some states these data are collected as part of the test adoption process.
These judgments about the importance of the knowledge and skills tested (and the appropriateness of passing scores) are used as indicators of test quality. Validity evidence based on test content helps provide assurance to policy makers, teacher candidates, and the public that the test measures what it purports to measure and that test results indicate the extent to which teachers are likely to possess the knowledge and skills considered necessary for teaching. Such evidence also has been used to uphold teacher licensure tests in at least two legal challenges (Association of Mexican American Educators v. California, 183 F.3d 1055, 1070-1071, 9th Cir., 1999; United States v. South Carolina, 445 F. Supp. 1094, D. S.C., 1977). Similarly, evidence that test developers did not follow professional standards has been used to bar the use of teacher tests (Richardson v. Lamar County Board of Education et al., 729 F Supp. 806, 820-21, M.D. Ala., 1989, aff'd 935 F 2d 1240, 5th Cir.).
It is important to note that the tests used for initial licensure —basic skills, subject-matter knowledge, and pedagogical knowledge —are not designed to measure effective teaching. Effective teaching requires many skills and types of knowledge. A given test that is used in the teacher licensing process may measure only some of these. Thus, passing such a test will not insure that a teacher will be effective in the classroom. For example, a state may determine that all teachers must be able to read at a particular level, or that all teachers must know some basic mathematics, regardless of whether their reading and computing skills are correlated with their overall effectiveness as a teacher. While this information may be deemed necessary, it is not sufficient for determining whether a candidate will be a successful teacher.
Currently, there is little research to show the relationship between candidates' scores on teacher licensing tests and their performance in the classroom. In part, the data are scant because it is methodologically difficult to investigate
these relationships. Some of the many obstacles include the difficulty of measuring teachers' effectiveness in the classroom and the lack of a commonly accepted valid and fair measure of effective teaching. In addition, the research is hampered by the difficulty in accurately distinguishing minimally competent from minimally incompetent classroom practice, the absence of job performance information for some unlicensed examinees, and the fact that some good teaching practice is context-specific —that is, it varies by student population, educational goals, school organization, community characteristics, and other factors.
Although it is difficult to examine the relationship between scores on teacher licensure tests and job performance, it is certainly possible and important to study these. Analyses of the relationship between scores on teacher licensure tests and effectiveness in the classroom would provide useful evidence about the validity of teacher licensure tests and could provide a better understanding of what the tests do and do not measure.
Several current test development efforts respond to the limits of the evidentiary base about and possible limitations in past and current teacher tests. One such effort is the work of the Interstate New Teacher Assessment and Support Consortium (INTASC), a group of 32 states that are developing standards for beginning teachers and related assessments. In part, INTASC's work is directed at establishing a broad consensus on knowledge and skill standards for beginning teachers and at achieving a better representation and measurement of those important teacher competencies (Porter et al., 2000).
Even under the best of circumstances, tests cannot be expected to measure everything that is important for success in the classroom, just as licensure tests in medicine and law do not measure all the qualities required for success in those professions. Teaching quality depends on many things. Obviously, teachers must be knowledgeable and know how to teach, but good teachers can explain ideas so that different students understand them; they are also compassionate, resourceful, committed, honest, and persistent in their efforts to help children learn. All of these things are important to teaching but difficult to measure. A single test or set of tests can only measure some of the characteristics associated with competent teaching. Nevertheless, this difficulty does not negate the value of assessing basic skills, subject-matter knowledge, and pedagogical knowledge.