Improving Teacher Licensure Testing
As described in Chapter 1, many states have adopted or are developing challenging new teacher standards. These standards define teacher performance in terms of the competencies teachers should demonstrate; they focus on the teacher’s ability to engage students in rigorous, meaningful activities that foster academic learning. As part of their reforms in teacher education and licensure, states are attempting to align newly developed teaching standards with their licensure requirements. Traditionally, teacher licensure has required candidates to complete an approved teacher preparation program and to pass the state’s required tests. As noted in Chapter 3, however, the tests currently in wide use for beginning teacher licensure measure only a subset of the knowledge and skills important for practice.
Many in the profession have recently called for systematic collection of evidence about the actual teaching performance of prospective and new teachers (Darling-Hammond et al., 1999; Klein and Stecher, 1991; Tell, 2001). They have asked for evaluations that examine what teachers actually do in the classroom and in planning for instruction. They have called for assessments that require prospective and new teachers to perform in situations that are both more lifelike and more complex than those posed by paper-and-pencil tests. They argue that performance assessments are used in other fields to license practitioners. In licensing lawyers, for example, the Multistate Performance Examination includes performance assessments that pose tasks beginning lawyers would be expected to accomplish, including writing persuasive memos, briefs, witness examination plans, and closing arguments <www.ncbex.org>. In medicine, computer-based simulations are used for the third stage of licensure testing <www.usmle.org>, and an assessment
that uses standardized patients1 is being piloted for use during the second stage of licensure testing (Swanson et al., 1995).
States are now beginning to explore performance-based assessment of prospective and beginning teachers’ competence. For example, as noted in Chapter 3, 10 states are participating in the Interstate New Teacher Assessment and Support Consortium’s (INTASC) Performance Assessment Development Project to develop prototypical classroom performance assessments that can be used to evaluate new teachers’ performance against INTASC’s professional standards <www.ccsso.org>. In California, prospective teachers can take the Reading Instruction Competence Assessment as a video-based performance assessment <www.ctc.ca.gov/profserv/examinfo/ricaexam.html>. In Connecticut, beginning teachers prepare discipline-specific teaching portfolios for second-stage licensure. In this chapter the committee describes assessments that are used and assessment research that is under way nationally and in several states.
It is important to clarify certain terms used in this chapter. Readers are reminded of the distinction made earlier between testing and assessment. The committee defines tests as paper-and-pencil measures of knowledge and skill; performance on them is evaluated and scored using a standardized process. Assessment is considered to be a broader term and a broader enterprise than testing. While assessment encompasses standardized paper-and-pencil measures, it also refers to other kinds of performance-based evidence. Moreover, it may describe a collection of different kinds of evidence, systematically gathered over time, to inform particular decisions or interpretations.
Selecting Assessment Cases
The committee’s initial search for performance-based assessment identified only one state that used performance assessments operationally to make initial licensure decisions. The committee then expanded its search to include performance assessments used by teacher education programs to warrant teacher education students’ competence and performance assessments used in second-stage licensure and certification. The committee selected four cases to illustrate different relevant assessments of teacher performance. They and the systems in which they have been developed and implemented are described here.
The committee chose to study the assessments of the National Board for Professional Teaching Standards (NBPTS) because its subject-specific portfolio and assessment center exercises are among the most established and prominent
of performance-based teacher assessments. These assessments are used to certify the accomplishment of experienced teachers. They provide an existence proof that performance-based assessment can be implemented on a large scale for assessing teacher competence. The NBPTS assessments have instigated and provide a basis for the work of INTASC and several states.
The committee also examined Connecticut’s subject-specific portfolio assessments. Connecticut’s assessments are of interest because they emulate the NBPTS assessments and were developed in collaboration with INTASC. These assessments are taken by beginning teachers during their second and/or third years of teaching as part of second-stage licensure. Additionally, the committee studied Ohio’s work with the Educational Testing Service (ETS) on the PATHWISE Classroom Induction Program-Praxis III and the Praxis III performance-based assessment. These systems are of interest because they use direct observations of classroom practice and are intended for use by more than one state. Like the Praxis I and II tests, these ETS products theoretically are viable options for many states. These programs are geared toward new teachers, with Praxis III intended for use in second-stage licensure.
Finally, the committee studied Alverno College’s integrated ongoing learning and assessment program for teacher candidates. Alverno’s program is of interest because it provides an example of a system in which a party other than a state or district could warrant teacher competence. In fact, licensing tests are used in Alverno College’s home state (Wisconsin), but the committee wanted to push its thinking about models in which teacher education institutions warrant candidates’ capabilities.
The committee’s intention in studying these cases was to pose multiple viable models for consideration in judging teacher competence.2 The committee wanted to address how these different assessment systems work to support evidence-based decisions about teacher competence and, ultimately, a well-qualified and diverse work force. Given the limited scope of what is assessed by conventional licensure tests, the committee also wanted to encourage further research into the development and evaluation of performance-based assessment systems in teaching.
Not all of these assessment systems are in full operation. For instance, performance-based assessment systems for teachers in Connecticut and Ohio—states that lead the nation in the use of such systems—are still under development, and much of the planned validity evidence is not yet available. Even the NBPTS, which has many assessments fully operationalized, is still developing standards
and assessments in other subject areas and is considering modifications to streamline existing assessments (D.Gitomer, ETS, personal communication, 2000). Thus, although the committee believes these practices may hold promise, currently available evidence does not allow evaluation of them in the same way conventional tests are evaluated in Chapter 5.
To guide its data-gathering efforts and to facilitate cross-case comparisons, the committee developed a list of program features to be described (see Box 8– 1). Information about each of these features is included if available and relevant. Each case description is based on documents available from the agency or institution that is the focus of the case and on interviews with key personnel. Wherever possible, supporting documentation, such as in-house research reports or implementation guides, was obtained.3
NEW AND DEVELOPING SYSTEMS
This section presents a synopsis of each case, focusing in particular on relevant performance-based assessments and professional development activities, if any. Complete case descriptions, including references to the full range of data collection and professional development activities undertaken by the state or institution, along with relevant citations, are provided in Appendix F.
The NBPTS provides an example of a large-scale, high-stakes, performance-based assessment of teaching that draws on portfolio and assessment center exercises. The board has developed a series of performance-based assessments for voluntary certification of accomplished teachers. Certification is available in 30 fields according to the subject taught and the developmental level of the students.
The philosophy behind NBPTS’s conception of accomplished teaching is reflected in its five core “propositions”:
teachers are committed to students and their learning;
teachers know the subjects they teach and how to teach those subjects to students;
BOX 8–1 Features of Case Studies
Statement of Teaching Qualities
Support for Prospective and Beginning Teachers
Coherence of the System
Validity Research Program
teachers are responsible for managing and monitoring student learning;
teachers think systematically about their practice and learn from experience; and
teachers are members of learning communities.
For each certification area, content standards are based on these propositions. The content standards are developed by committees comprised of experienced teachers in particular certification areas and others with expertise in relevant areas, such as child development, teacher education, and particular academic disciplines. The content standards guide all aspects of assessment development.
Each assessment consists of a portfolio completed by candidates at their school sites and a set of exercises to be completed at an assessment center. The school-based portfolio consists of two parts: (1) three entries that are classroom based and include two videos that document the candidate’s teaching practice through student work and (2) one entry that combines the candidate’s work with students’ families, the community, and collaboration with other professionals. The six assessment center exercises require candidates to demonstrate their knowledge of subject matter content.
Each assessment task is scored in accordance with a rubric prepared during the development phase and later illustrated with multiple benchmarks (sample responses) at each performance level. The rubrics encompass four levels of performance on a particular task with the second-highest level designated as meeting the standard. Most exercises are scored independently by two trained assessors. Exercise scores are weighted and summed to form a total. To make certification decisions, the total score is compared to a predetermined passing score that is uniform across certificates. The uniform performance standard was set by NBPTS following a series of empirical studies using different methods of standard setting.
States and local education agencies have their own policies regarding how NBPTS-certified teachers are recognized and rewarded. Although NBPTS’s direct involvement in professional development and support activities is limited, it does encourage and support locally initiated activities.
NBPTS maintains an ongoing program of psychometric research into the technical quality of its assessments. Evidence routinely gathered for each assessment includes documentation of the development process, estimates of reliability and measurement error, expert judgments about the fit between the assessments and the content standards, examination of disparate impact, and evidence of the validity of the scoring process. Additional special studies address such issues as examining potential sources of disparate impact to rule out concerns about bias (Bond, 1998b), considering whether alternative means of assessment (e.g., direct observations and interviews, samples of student work) can reproduce classifications of candidates as certified or not, and comparing the professional activities of teachers who have and those who have not received NBPTS certification
(Bond et al., 2000). Additional information on these studies can be found at the NBPTS’s website <www.nbpts.org> and in professional journals.
Connecticut’s Beginning Teacher Induction Program
Connecticut, working in collaboration with INTASC, exemplifies a state that has implemented a licensing system that relies on performance-based assessments. Connecticut’s Beginning Educator Support and Training Program is a comprehensive three-year induction program that involves both mentoring and support for beginning teachers and a portfolio assessment. The philosophy behind Connecticut’s program is that effective teaching involves mastery of both content and pedagogy. This philosophy is reflected in Connecticut’s Common Core of Teaching, which in intended to present a comprehensive view of the accomplished teacher. The Common Core of Teaching specifies what teachers should know, how they should apply their knowledge, and how they should demonstrate professional responsibility (see Appendix F). The Common Core of Teaching guides state policies related to preservice training, induction, evaluation, and the professional growth of all teachers. The state also has subject-specific standards.
The preservice training requirements for prospective teachers are specified in terms of a set of competencies as distinct from a list of required courses. The competencies encompass the body of knowledge and skills the state believes individuals should develop as they progress through the teacher education program. Prospective teachers must pass Praxis I to enter a teacher preparation program and Praxis II to be recommended for licensure.
During their first three years of teaching, beginning teachers receive support from school- or district-based mentors and through state-sponsored professional development activities. During their second year, teachers must compile and submit a discipline-specific teaching portfolio. In the portfolio, teachers document their methods of lesson planning, teaching, assessment of student learning, and self-reflection in a 7- to 10-day unit of instruction. The portfolio includes information from multiple sources, such as lesson logs, videotapes of teaching, teacher commentaries, examples of student work, and formal and informal assessments. The state offers seminars that instruct teachers in ways to meet professional standards through the portfolio process.
Scorers are trained to evaluate the portfolios using criteria based on content-focused teaching standards. Portfolio scorers receive up to 70 hours of training and must meet a proficiency standard before being eligible to score. If the portfolio does not meet the acceptable standards, the teacher is provided with another opportunity to submit a portfolio during the third year of teaching. If a teacher fails to meet the standard by then, he or she is ineligible to apply for a Connecticut provisional certificate and cannot teach in Connecticut public
schools. To regain certification, such an individual must successfully complete a formal program of study approved by the state.
Connecticut has an extensive ongoing program of research that includes job analyses, the collection of content- and construct-related evidence of validity, reliability and generalizability research, an examination of program consequences, and sensitivity reviews and bias analyses for each content area. Studies have included (1) examinations of the relationships between teachers’ performance on the portfolio assessment and other quantitative information about teachers, such as their undergraduate grade point averages, SAT scores, and Praxis I and II scores; (2) on-site case studies of beginning teacher performance; (3) expert review of portfolios; and (4) the relationship between portfolio performance and student achievement in reading, language arts, and mathematics. Surveys of mentors, portfolio assessors, school administrators, principals, beginning teachers, and higher-education faculty are conducted annually to examine program effectiveness and impact. Additional details about the studies conducted by Connecticut appear in Appendix F.
Ohio’s Teacher Induction Program
Ohio exemplifies a state that plans to incorporate a commercially available program into its licensing system, which is being redesigned. By 2002 the state will require beginning teachers to successfully complete an entry-year program of support, including mentoring, provided by the employing district, and to pass a performance-based assessment administered by the Ohio Department of Education in order to obtain a professional teaching license. Ohio’s entry-year induction program is based on the ETS-developed PATHWISE Induction Program-Praxis III Version for mentor training and mentor assistance. Ohio is calling this program the Ohio FIRST Year Program (Formative Induction Results in Stronger Teaching). The state plans to use ETS’s Praxis III Classroom Performance Assessment beginning in 2002. Both Praxis III and the PATHWISE Induction Program draw on the same four teaching domains: organizing content knowledge for student learning, teaching for student learning, creating an environment for student learning, and teacher professionalism. Nineteen criteria have been developed for these domains and serve as the basis for evaluating teachers’ performance (see Appendix F). Many of Ohio’s state institutions have incorporated these domains into their preservice education programs, and the PATHWISE Observation System will be integrated into the preservice program.
The PATHWISE Induction Program-Praxis III Version consists of 10 structured tasks designed to encourage collaboration between beginning teachers and their mentors. One task, for example, requires beginning teachers to gather information from colleagues, journals, and texts about particular aspects of teaching and, with a mentor’s guidance, to use the information to develop an instructional plan, implement it, and reflect on the experience. Another asks for indi-
vidual growth plans that specify beginning teachers’ plans for learning more about particular teaching practices, school or district initiatives, or other teaching challenges.
Mentors are expected to be experienced teachers who, ideally, teach the same subject or grade level in the same building as the entry-year teacher. Mentors receive training in the PATHWISE Induction Program-Praxis III Version, and their service as mentor teachers can be part of their individual professional development plans that count toward licensure renewal.
The Praxis III assessment, which is designed to be used across content areas, employs three data collection methods: direct observation of classroom practice, written materials prepared by the beginning teacher describing the students and the instructional objectives, and interviews structured around classroom observations. As part of the assessment, the beginning teacher provides written documentation about the general classroom context and the students in the class. During the observation, assessors view the teacher’s practices and decisions in the classroom. Semistructured interviews with the beginning teacher before and after the observation provide assessors with an opportunity to hear the teacher reflect on his or her decisions and teaching practices and to evaluate the teacher’s skill in relating instructional decisions to contextual factors. Observers are trained in observation, interpretation, and scoring of the performance assessment data.
Praxis III has been piloted for seven years in Ohio but has not yet been used for making high-stakes licensure decision. In January 2000 the State Board of Education set passing scores on Praxis III. Since Praxis III has not yet been used operationally, the available research consists of content-related evidence of validity collected as part of the test development process. The content and knowledge base covered by the PATHWISE Induction Program-Praxis III was identified by ETS through an extensive series of studies that included job analyses, a review of the literature, and a compilation of teacher licensing requirements in all 50 states. Reports on the development work for Praxis III and PATHWISE are available through ETS (Dwyer, 1994; Wesley et al., 1993; Rosenfeld et al., 1992a; 1992b, 1992c).
School districts in Ohio are expected to develop and implement their own plans for entry-year programs in accordance with state guidelines and with financial support provided from the state. Several districts in Ohio have had comprehensive induction programs in place for a number of years. For example, the Cincinnati School District has had a peer review and induction program in place since 1985, although it is undergoing change to adapt to the new state requirements. Cincinnati’s program is mentioned here because it provides an example of a district developing its own induction and evaluation system and because it shares some common features with the system Ohio plans to implement statewide.
Cincinnati’s system has components aimed at teacher preparation, teacher
induction, and teacher evaluation and compensation. The district collaborates with the University of Cincinnati in offering a fifth-year graduate internship program that places teacher candidates in one of seven professional practice schools in Cincinnati. The interns are mentored and evaluated by career and lead teachers. The district’s teacher induction program provides support and ongoing feedback to beginning teachers. The program uses experienced teachers as both mentors and evaluators of new teachers. The example of Cincinnati suggests that in addition to the role played by national, state, and local institutions, districts can play an active role in teacher assessment and licensing.
Performance-Based Teacher Education at Alverno College
In this case study the committee provides a description of teacher education and assessment as practiced at Alverno College, which undertook development of a performance-based baccalaureate degree over 20 years ago (Diez, et al., 1998). This change resulted in an overhaul of the college’s curriculum and approach to teaching. The new approach is characterized by publicly articulated learning outcomes, realistic classroom activities and field experiences, and ongoing performance assessments of learning progress. Alverno’s program is of interest because it provides an example of a system in which a party other than a state or district could warrant teacher competence. Thus, the focus here is on Alveno as a working program that can expand the debate about other models for warranting teacher competence.
All students enrolled at Alverno College are expected to demonstrate proficiency in the following eight ability areas: communication, analysis, problem solving, values within decision making, social interaction, global perspectives, effective citizenship, and aesthetic responsiveness (see descriptions in Appendix F). These abilities cut across discipline areas and are subdivided into six developmental levels. The six levels for each ability area represent a developmental sequence that begins with awareness of one’s own performance process for a given ability and that specifies increasingly complex knowledge, skills, and dispositions.
The teacher education program at Alverno College builds on the foundation provided by the eight general education abilities. The program’s performance-based standards require teacher candidates to demonstrate competency in five areas:
Conceptualization: Integrating content knowledge with educational frameworks and a broadly based understanding of the liberal arts in order to plan and implement instruction.
Diagnosis: Relating observed behavior to relevant frameworks in order to determine and implement learning prescriptions.
Coordination: Managing resources effectively to support learning goals.
Communication: Using verbal, nonverbal, and media modes of communication to establish the classroom environment and to structure and reinforce learning.
Integrative interaction: Acting with professional values as a situational decision maker, adapting to the changing needs of the environment in order to develop students as learners.
These teaching abilities refine and extend the general education abilities into the professional teaching context; they define professional levels of proficiency that are required for graduation from any of the teacher education programs. While the professional teaching abilities are introduced in the first year, they receive heavy emphasis during the junior and senior years.
The program places emphasis on using knowledge effectively in a context and describes its approach as “assessment as learning.” Each course is structured around the assessments and learning outcomes that must be demonstrated to claim mastery of the course material. Assessment criteria are made public, and the paths connecting particular concrete activities to general abstract abilities can easily be traced. Evaluations of students are ongoing and handled by means of course-based assessments that require Alverno students to demonstrate what they have learned through activities such as essays, letters, position papers, case study analyses, observations, and simulations. Faculty members evaluate Alverno students’ performances and provide diagnostic feedback; students are also expected to evaluate themselves and reflect on their performance on any given exercise.
Coursework is intentionally sequenced to reflect developmental growth and to provide for cross-course application of concepts. For example, a mathematics methods course assessment might ask teacher candidates to (1) create a mathematics lesson for first graders that incorporates concepts from developmental psychology, (2) teach the lesson, and (3) describe the responses of the learners and the adaptations made.
Alverno’s education program is characterized by extensive opportunities for field experiences. For its education majors, classroom-based field experiences progress from working one on one with students to working with small groups and entire classes. Alverno teacher candidates design lesson plans, teach the lessons, and reflect on the effectiveness of their instruction. Teacher candidates keep logs that require them to reflect on their practices and to make links between theoretical knowledge and practical application, observe processes and environments of learning, translate their content knowledge into short presentations, and begin to translate their philosophy of education into decisions about the instructional process.
Before student teaching, teacher candidates compile a portfolio consisting of samples of written work, lesson plans, videotapes of their interactions with children, and instructional materials. They develop a resume and write an analy-
sis of a videotaped lesson. Portfolios are reviewed by teams of teachers and principals who pose questions on teaching practices to candidates. Readiness for student teaching is judged on the basis of the portfolio as well as performance during the questioning session.
Students at Alverno College are required to meet the licensing requirements of the state of Wisconsin, which currently include a basic skills test and endorsement from the institution conferring the teaching degrees. Beginning in 2004, Wisconsin will require all teacher candidates to compile portfolios that demonstrate their performance in relation to the state’s teaching standards. Candidates who pass the portfolio review will be granted provisional licenses to teach for three to five years while pursuing professional development goals related to the standards.
Alverno routinely conducts internal and external reviews of its assessment instruments and practices in light of its curriculum goals and students’ performance. In addition, research has examined the extent to which Alverno graduates consider themselves prepared for teaching; research has also looked at their job satisfaction and retention in the field and employers’ perceptions of their qualifications (Zeichner, 2000). Research on Alverno’s program is documented and widely distributed and is published in professional journals.
ANALYSIS OF ALTERNATIVES
To conclude this chapter, the committee focuses on what can be learned from a comparison of the design principles underlying the assessments reviewed here. In each of the cases involving prospective or beginning teachers, there is consistent attention to the following features that the committee considers important components of a sound licensure systems:
There is a coherent statement about the qualities of teaching valued by the institution or agency.
There is a means for providing evidence of a teacher’s actual performance with his or her students in a classroom (either through videotapes and artifacts or direct observations).
There is a coherent system of assessments that, taken together, cover a broad range of teaching qualities, including evidence about basic skills, content and/or pedagogical knowledge, and teaching performance. The assessments are staged at relevant points in time across a prospective teacher’s preparation and beginning teaching experiences.
There is sustained attention to the professional development and support of prospective or beginning teachers integrated with the qualities and practices covered by the assessments. Support systems draw on the capabilities of experienced teachers, thereby supporting professional development for experienced teachers as well as beginning teachers.
For the programs that have begun or are beginning operational use, there is an ongoing program of research into the validity of these assessments and opportunity for outside professionals to review the practices.
Looking across these case studies of assessment practice, the committee also found instructive differences. While all of the selected systems involve performance-based assessments of teaching, the programs use different measurement methodologies (e.g., school-based portfolios, observation systems, assessment centers) and emphasize different aspects of teaching performance. The agency or institution responsible for implementing performance-based assessments also differs for the various cases, including two states, a professional organization, and a teacher preparation program. The particular decisions supported by the assessments differ, as do the means for combining results with other information to make decisions about teacher competence. Furthermore, the systems in which the performance assessments operate are more or less tightly coupled; in one case the performance-based assessment is a fully integral ongoing part of the learning sequence, whereas in another case the connections between assessment and support are less direct.
Looking more specifically at the performance assessment methodologies, additional differences can be noted. These include differences in the way in which the statements of teacher competence guiding the assessments are developed and characterized (e.g., professional consensus, job analysis survey); in the scope of teaching to which the assessments apply (e.g., content-specific assessments, assessments intended for teaching across content areas); and in the way responses are scored and evidence is combined to inform overall decisions. There are also differences in the way standards are set (e.g., based on actual performances or profiles of scores, embedded in scoring rubrics) and in the nature of support provided to candidates, mentors, and assessors. The balance of contextualized versus standardized forms of evidence about teaching competence also varies across these programs.
The two cases that have been in operation the longest call for special note. The NBPTS focuses on voluntary assessment of experienced teachers. For our purposes, it serves as an existence proof that centrally administered, large-scale, high-stakes assessments of teaching performance can meet professional standards of technical quality for standardized assessments. As such it provides an important model for states, districts, teacher preparation programs, or other organizations developing performance-based assessments. As noted, Connecticut already has drawn on the work of the NBPTS in developing its portfolio assessment (with some unique design features of its own).
The NBPTS’s assessment also serves at least two additional purposes relevant to the committee’s charge. First, it can be used to support reciprocity across states in granting licenses based on evidence of performance, albeit at the advanced practice level. It suggests possibilities for reciprocity in initial licensing.
Second, and perhaps most important, it offers opportunities for assessment and recognition of accomplished teachers, thus providing incentives for professional development and advancement across the span of a career.
The teacher education and student assessment program at Alverno College provides an existence proof of a different sort; it demonstrates that evidence-based decisions about readiness to teach can be based on assessment practices contextualized at the local level as part of a program integrating learning and assessment in a developmental sequence. Whether this sort of contextualized assessment system culminating in high-stakes decisions can be successfully implemented in a broader range of teacher education institutions is a question that deserves further study.
Research is needed to understand the impact of these different assessment choices on decisions about teaching quality. While the committee believes all design choices should be situated in an assessment with a strong program of validity research, it is not believed that these differences necessarily can or should be resolved into a single set of recommendations. The diversity is productive: it provides an important source of alternatives for triangulation and critical review and revision. As Messick (1989:88) reminds us: “The very recognition of alternative perspectives about the social values to be served, about the criteria to be enhanced, or about the standards to be achieved should be salutary in its own right. This is so because to the extent that alternative perspectives are perceived as legitimate, it is less likely that any one of these perspectives will dominate our assumptions, our methodologies, or our thinking about the validation of test use.”
Some issues of concern to the committee have not been sufficiently addressed by these assessment programs. These concerns, the committee notes, are equally relevant to its review of conventional testing programs. First, very little is known about how any of these different assessments of teacher quality relate to one another or to other indicators. To be fair, the professional testing standards do not explicitly require such evidence before an assessment is put into operational use. To their credit, the NBPTS and the state of Connecticut have undertaken such studies on a small scale with specific assessments. Alverno’s practices also provide routine opportunity for informal triangulation through the multitude of appraisals of any given student. Cross-fertilization among these programs and between these programs and more conventional assessment practices would be fruitful. Examining the relationships among different assessments of teacher competence would contribute relevant validity evidence by documenting commonalities and differences that can and should be explained as part of a program’s overall plan for validity research.
Second, with the possible exception of Alverno, very little is known about the quality of the overall decision about competence or accomplishment. Licensure decisions rest on the combination of information across disparate sources of evidence. In most of the examples the different assessments within a given
system are sequential and conjunctive; they present distinct hurdles that a teacher must pass in order to continue. Each decision has the potential to reduce the pool of prospective teachers. This represents an implicit and underexamined theory about professional development that needs further investigation.
Third, one advantage originally associated with performance-based assessment was its potential as an “antidote for differential performance between majority and minority candidates” (Bond, 1998a:28). However, the experiences of the NBPTS suggest otherwise. Differences between African Americans and whites on the NBPTS assessments mirror those seen for multiple-choice exams. Studies have examined these differences in relation to gender, years of teaching, location of teaching assignment, putative quality of the baccalaureate degree-granting institution, support during preparation for the assessment, assessment exercise type, writing load of the assessment task, assessor training, and assessor ethnicity. None of these factors have been found to fully explain the performance differences observed between African American and white teachers on these assessments (Bond, 1998a, Bond 2000; A.Harman, NBPTS, personal communication, 2001). Bond concludes that the differences “may well be traceable to more systematic factors in U.S. society at large” (1998a:254).
Fourth, none of these assessment programs examine the teaching performances of the same individuals across different contexts of teaching. The different schools and communities in which a teacher is licensed to teach offer unique challenges and opportunities, yet a teacher’s performance is typically evaluated only in a single context. The committee contends that licensure systems should incorporate information about a teacher’s ability to work effectively with students in a variety of settings. For instance, student teaching requirements and the assessment data they yield could be structured to assure multiple contexts that include diverse learners. Also, support and professional development may be needed when teachers (whether beginning or experienced) move into new schools and communities.
In closing, the committee believes that articulating the validity issues that these cases suggest is an important challenge for the field. In Chapter 4 the committee presents an evaluation framework for standardized forms of testing. Some of its criteria apply directly to the assessments described here. The criteria for the purposes of assessment, the competencies to be assessed, and others are meaningful and important to judgments about performance-based assessments of teacher competence. However, for other evaluation criteria, their meaning and utility are less immediately clear for these assessment forms. The developmental nature of the systems in which these assessments reside, the varied ways in which candidates demonstrate their knowledge and skills within assessment forms, the balance between the information value of these assessments and the professional development benefits that accrue to examinees and other participants, and other differences raise issues about the validity evidence needed to support them.
The committee asserts that the evaluation criteria and evidence for these assessments should be rigorous, just as rigorous as those for conventional teacher licensure tests. The committee suspects, however, that the forms of evidence that will be telling and the criteria that should guide judgments about the soundness and technical quality of the assessments described here may differ somewhat from those outlined in Chapter 4. The committee challenges test developers, practitioners, and researchers to consider its evaluation framework as well as other more conventional evaluation frameworks and to decide which criteria best apply to judgments about newer forms of assessments, which criteria have important and helpful corollaries, and where new criteria may be needed that address the particular validity issues raised by performance-based assessments. For example, consistency and generalizability are important criteria in traditional evaluation frameworks; although they might be instantiated differently, they are potentially important concepts in evaluating alternative assessments as well. The utility of other evaluation criteria that speak to the unique validity issues raised by these assessments also should be considered. Furthermore, careful study of the validity criteria used by these performance-based assessment programs might suggest additional criteria that are relevant to more conventional forms of assessment. It is beyond the committee’s charge to suggest appropriate validity practices for new forms of assessment. It urges the field to do so.
Several new and developing teacher assessment systems use a variety of testing and assessment methods, including assessments of teaching performance. They include multiple measures of candidates’ knowledge, skills, abilities, and dispositions. In these systems, assessments are integrated with professional development and with the ongoing support of prospective or beginning teachers.
Given its analysis of systems that employ performance-based assessments, the committee concludes:
New and developing assessment systems warrant investigation for addressing the limits of current initial teacher licensure tests and for improving teacher licensure. The benefits, costs, and limitations of these systems should be investigated.