Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
2 The Evaluation Framework and Collection of Data The committeeâs charge from Congress was to develop a rigorous con- ceptual and methodological framework for evaluating programs that award advanced-level certification to teachers and to apply that framework to the National Board for Professional Teaching Standards (NBPTS). In particular, Congress asked the committee to use the strongest practical methodologies to consider (1) the impacts on teachers who obtain board certification, teachers who attempt to become certified but are unsuccessful, and teach- ers who do not apply for such certification; (2) the extent to which board certification makes a difference in the academic achievement of students; and (3) the cost-effectiveness of NBPTS certification as a means for improv- ing teacher quality. In developing the framework and conducting this evaluation, we relied extensively on the professional standards that guide program and psycho- metric evaluations. This chapter begins with a discussion of those standards and procedures, particularly as they apply to evaluations of certification assessments. We then turn to the evaluation framework and describe its components and our rationale for including them. The final section focuses on the evidence, discussing the evidence available from existing studies and the information we collected ourselves. conducting PROGRAM EVALUATIONs The committeeâs charge was to conduct an evaluation of the NBPTS program. Program evaluation is a formalized approach to studying the 20
THE EVALUATION FRAMEWORK AND COLLECTION OF DATA 21 goals, processes, and impacts of projects, policies, and programs. Such âsystematic investigations of the worth or meritsâ of a program (Joint Committee on Standards for Educational Evaluation, 1994) often pose questions like these (Rossi, Lipsey, and Freeman, 2004): What is the nature and scope of the problem? Is the particular intervention reaching its target population? Is the intervention being implemented well? Is the intervention effective in attaining the desired goals or benefits? Does the program have important unanticipated consequences? Are the program costs reasonable in relation to its effectiveness and benefits? The evaluation plan is typically organized around the questions posed by those who commissioned the eval- uation, but it also should be responsive to the needs of other stakeÂholders (Standard U3). Program evaluations are expected to address the issues that matter, collect information that is relevant and meaningful for the goals of the evaluation, analyze the information using rigorous and fair methods, and communicate the results in a form that is usable and meaningful to decision makers. There are two major types of evaluations: (1) those designed to distin- guish worthwhile programs from ineffective ones, and (2) those designed to help improve existing ones in order to achieve certain desirable results. The former are often called formative evaluations, and they are conducted to provide information on how a program should be delivered or to furnish in- formation for guiding program improvement (Scriven, 1991). The latter are called summative evaluations, and they are conducted to determine whether a programâs expectations are being met and what its consequences are (Scriven, 1991). This is the kind of evaluation that our charge required. Summative evaluations generally focus on whether a given program (e.g., a social program, an educational intervention) is effective. For ex- ample, summative evaluations might study such issues as the programâs accomplishment of its intended objectives, impacts beyond those that were intended, how effectively resources have been used, the benefits of the pro- gram and what it costs to produce these benefits, and alternative interven- tions that might produce similar benefits. Summative program evaluations usually focus on the effects of a program on outcomes for a client popula- tion and consider the extent to which the program changes the outcomes for participants. For example, the United States has a long history of commissioning evaluations of government-sponsored employment training programs de- signed to help unemployed workers or workers with relatively few skills find employment. Such evaluations attempt to infer the causal impact of enrolling in the program on the outcomes of interest for the participant, such as the probability of obtaining a job or the level of wages earned. A particular challenge with this kind of an evaluation lies in trying to determine whether a change in outcomes for participants is in fact attribut-
22 ASSESSING ACCOMPLISHED TEACHING able to the program itself. Events or processes outside the program may be the real cause of the observed changes (in the case of employment train- ing programs, outcomes may be due to changes in the broader economy). Another challenge with this type of evaluation is that the program has an incentive to select candidates with the strongest skills rather than candidates with the greatest need, so that it achieves the best outcomes. Often data are not available that allow the evaluator to clearly isolate the effects of the program on the participants versus the effects from extraneous factors or the effects on the broader population compared with its effects on a particular subpopulation. We return to these issues in subsequent chapters as we discuss the findings from our evaluation. Generally, a program evaluation involves collecting a variety of kinds of data using both qualitative and quantitative methodologies. Amassing a wide collection of data helps the evaluator determine the areas of consensus in the results with regard to the effectiveness of a program and the areas in which additional research is needed. Guidelines for conducting program evaluations are documented in The Program Evaluation Standards: How to Assess Evaluations of Educational Programs, 2nd edition (Joint Commit- tee on Standards for Educational Evaluation, 1994). These standards lay out guidelines for accepted practices that represent the consensus opinions endorsed by practitioners in the field of program evaluation. Evaluating Credentialing Tests The national boardâs program consists primarily of a certification as- sessment, and several sets of standards exist for guiding evaluations of assessment programs. The most well known are Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, and National Council on Measure- ment in Education, 1999), Principles for the Validation and Use of Per- sonnel Selection Procedures (Society for Industrial and Organizational Psychologists, 2003), and the Standards for the Accreditation of Certifica- tion Programs (National Commission for Certifying Agencies, 2004). In addition to the program evaluation standards, we relied on these sets of standards to formulate our framework and to guide our evaluation. A certification test, such as the national boardâs assessment, falls into a category of examinations known as credentialing tests. Credentialing tests include those used in the process of initial licensure of new profes- sionals and the voluntary certification of professionals (see Box 2-1 for an explanation of these terms as they are used in this report). Evaluation of these kinds of assessments typically focuses on a review of the processes used to develop the assessment and its psychometric properties. The review includes the methods for determining the content to be assessed and the
THE EVALUATION FRAMEWORK AND COLLECTION OF DATA 23 BOX 2-1 Terminology: Licensure Versus Certification This report focuses on credentialing tests, which include those used for licen- sure or certification. Within the general category of credentialing tests, however, the terms licensure, credentialing, and certification are used in overlapping ways, and for that reason they can be confusing. We focus on certification tests that are designed to identify teachers who have advanced skills, significantly beyond those of entry-level teachers obtaining initial licenses. For the purposes of this report: â¢ Licensure is the granting of permission to practice a particular occupation or profession by a recognized authority. â¢ Certification is a voluntary means of establishing that certain individuals have mastered specific sets of advanced skills that come with expertise developed over time. Thus, for example, beginning teachers are licensed, usually by states; gradu- ates of professional or academic programs, such as medical school or a voca- tional training program, earn credentials; and practitioners who have developed advanced expertise (often after earning academic credentials and being granted a license), through some combination of training and experience, may be certified as having advanced status in their profession. appropriateness of this content, the methods for scoring the assessment and the reliability of the resulting scores, and the methods for setting the score required to pass the assessment and the appropriateness of the pass score, and more. Psychometric evaluations also include the collection of validity evidenceâevidence examined to ascertain the extent to which the inferences to be made about the test results are reasonable. There are several issues with regard to evaluating the validity of credentialing tests that warrant additional discussion. Credentialing Tests and Validity Evidence Although sensitive to the common misunderstanding that there are dif- ferent âtypesâ of validity, psychometricians have defined several different kinds of validity evidence that can be used to contribute to the question of whether inferences based on the scores from a given test are valid. Content validity evidence examines the extent to which the test covers the intended domain of content and skills. It is usually established through systematic judgments by experts who compare the content of the test with an exter-
24 ASSESSING ACCOMPLISHED TEACHING nal set of standards, specifications, or other descriptions of the domain of coverage. Construct validity evidence addresses the extent to which the assessment is measuring the construct (knowledge or skill) it is intended to measure, rather than unrelated skills. For example, a test that is intended to measure only mathematical skills but that includes items that are writ- ten in complex language, thus requiring advanced reading skills, may pro- vide poor support for inferences about mathematical skills. A third kind of information is referred to as criterion-related validity evidence, which evaluates the extent to which test performance agrees with some criterion of interest and thus either correlates with some well-established measure of the domain of interest or accurately predicts future performance. The intended purpose of most licensure and certification tests is to provide assurance that successful candidates have the knowledge, skills, and judgment required in practice. A preliminary case for the validity of this interpretation is typically made on the basis of content-related evidence, showing that critical knowledge, skills, and judgments have been identi- fied (e.g., using a practice analysis or systematic study of the behaviors, knowledge, and practices of professionals in the field being assessed) and that these content areas are adequately sampled by the test. This validity argument is buttressed by a process of first identifying and then refuting challenges to the validity of the proposed interpretation and finally ruling out various potential sources of systematic error (such as the effects of varying test formats or inappropriate scoring standards). Assuming that the proposed interpretationâthat a certain score indicates mastery of a domain of critical knowledge and skillsâsurvives attempts to falsify it, the proposed interpretation can be presumed reasonable. It is rarely possible to provide convincing criterion-related validity evidence for credentialing tests because of the difficulty in obtaining external measures that themselves satisfactorily assess performance across all practice settings. This is reflected in the established standards for the measurement field (e.g., American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999, p. 157), which require credentialing assessments to demonstrate content validity evidence but not criterion-related validity evidence. The standards explicitly do not require the collection of criterion-related validity evidence, in part because obtaining valid and reliable criterion measures for creden- tialing tests, such as on-the-job performance, is generally not feasible. Job performance is difficult to measure reliably and validly, especially for the kinds of professions that require complex decision making, continued self- education, and other complex cognitive capacities. Many characteristics beyond those that can be measured in an assessment program are needed for success, and the circumstances in which the job is performed also have a strong influence on performance. Thus, isolating the effects of the mastery
THE EVALUATION FRAMEWORK AND COLLECTION OF DATA 25 that was established by passing the certification test from other influences on job performance is very difficult. Our charge asks that we consider the impact of board certification on student learning. Measures of outcomes for students, such as their academic achievement, do provide a means of evaluating teachersâ job performance, but there are some drawbacks to the use of this kind of a criterion measure. It is enlightening to consider what this would mean if extrapolated to other fields. For example, this is similar to evaluating the validity of a medical certification test by collecting information about the outcomes for patients of a board-certified physician or evaluating the validity of the bar exam by considering the outcomes for clients of a lawyer who had passed the bar exam and been admitted to the bar. Outcomes for patients reflect many factors other than the skills and knowledge of the physician who provides services, such as the severity of the illness being treated and the degree to which the patient adheres to the professional advice given. Likewise in law, the outcome for the client depends on such factors as the nature of the legal problem, the record of prior legal problems, and the extent to which the cli- ent follows the advice. Furthermore, should the outcomes for a high-priced lawyer, who can select his or her clients, be compared to the outcomes for a public defender? While data are available that might be used in such evalu- ations (e.g., rates of death or guilty verdicts) and several such studies have been conducted (e.g., Norcini et al., 2002; Tamblyn et al., 1998, 2002), many factors can contribute to the outcomes, making interpretation of the relationships very tricky. The same concerns are present in using studentsâ academic achieve- ment to evaluate the performance of their teachers. Many factors interact to influence studentsâ achievement, and it is difficult to isolate the contri- butions of the teachers from those of other factors. As the reader will see, researchers have tried a variety of statistical strategies to make the findings interpretable, but it remains difficult to obtain solid criterion-related valid- ity evidence for this credentialing program. Because impact evidence is key to the broader program evaluation process and it was explicitly part of our charge, we defined our framework broadly and to encompass the notion of criterion-related validity. the evaluation framework We developed the evaluation framework by first theorizing about the ways that an advanced-level certification program for teachers might affect teaching practices and the teaching profession. In laying out our theories, we reviewed the NBPTS founding documents to gain insight into how the founders thought such a program might operate (Chapter 3 describes this in detail). At the same time, we tried to balance the boardâs broad goals with
26 ASSESSING ACCOMPLISHED TEACHING those that other programs might have, and we considered how other profes- sions view advanced-level certification. We developed a set of assumptions that capture our thinking about the kinds of impacts an advanced-level certification program might have and, after considering their feasibility, turned them into questions that formed the evaluation framework. Below are the assumptions we laid out. â¢ A program for offering advanced-level certification is intended to be a means for identifying teachers who possess the knowledge, skills, dispositions, and professional judgment that characterize accomplished practice. â¢ A program for certifying accomplished teachers is expected to im- prove teachersâ practice in a number of ways. o The existence and wide distribution of defined standards and assessments will influence teacher preservice training and pro- fessional development. o The process of preparing for assessments will improve the practice of teachers who participate. o Board-certified teachers will serve as mentors for other teachers and influence their practice. â¢ A program for offering advanced-level certification is expected to improve job conditions for experienced and highly qualified teach- ers in a number of ways. o The existence of certified teachers will help to professionalize the field of teaching. o Board-certified teachers will be rewarded with higher pay. o Board-certified teachers will be offered expanded leadership opportunities as teachers, not just as administrators. o The recognition offered by board certification will increase teachersâ job satisfaction. â¢ A program for certifying accomplished teachers is expected to im- prove education systems in a number of ways. o The opportunity for advanced-level certification and profes- sionalization of the field will decrease the teacher turnover rate and, in particular, help to keep the most qualified teachers in the profession. o The presence of certified teachers will lead to better teaching among other teachers. â¢ Ultimately, all of these changes to the teaching field will help to improve teacher quality and, in turn, improve student learning. The committee grouped these assumptions into eight primary questions that form the basis for the evaluation framework. The first two questions
THE EVALUATION FRAMEWORK AND COLLECTION OF DATA 27 address the technical quality of the assessment. As mentioned earlier, meet- ing high technical standards is a fundamental criterion for the evaluation of any testing program (and to some, the only relevant criterion). The remain- ing questions address the various impacts associated with the program and as specified in our charge. For each primary question, we also identified subsidiary questions that lay out the kinds of empirical evidence needed. Box 2-2 displays the full evaluation framework. The theory on which our eight primary questions are based can be understood as a model for thinking about the potential impact of a certifi- cation program for accomplished teachers on teacher quality and student learning, as shown in Figure 2-1. In this figure, rectangular boxes indicate aspects of the model included in our evaluation framework, and the num- bers in parentheses indicate the specific framework question. We refer to this figure throughout the report as we develop our evaluation in terms of the eight questions. Collecting evidence The second major component of the committeeâs charge is to apply the general framework in an evaluation of the effectiveness of the national boardâs approach to certifying accomplished teachers. We began this task by assessing the available evidence. We scanned the ERIC database for articles written about the NBPTS from 1994 onward. This search covered articles in peer-reviewed journals, research reports published by the or- ganization that sponsored the particular study, conference presentations, articles published by the NBPTS itself, dissertations, and books. In total, we identified 135 articles, although the majority consisted of position state- ments about the advantages or disadvantages of the certification program, how the board can be of use in reforming teacher education or profession- alizing teaching, and the like. The NBPTS maintains its own bibliography of relevant studies (see http://www.nbpts.org/resources/research). As of June 2007, the NBPTS bibliography contained 161 articles. The majority of these articles were technical reports, and the remainder were position papers, advocacy pieces, reports of empirical research conducted by the board and independent researchers, and a set of studies referred to as the âgrant funded studies.â The technical reports and the grant-funded studies deserve additional explanation. NBPTS Technical Reports NBPTSâ bibliography of 161 articles includes 128 that are technical reports, 6 prepared by the current contractor (the Educational Testing Service [ETS]) and 122 in a group of articles referred to as the Technical
28 ASSESSING ACCOMPLISHED TEACHING BOX 2-2 The Committeeâs Evaluation Framework Question 1: To what extent does the certification program for accomplished teachers clearly and accurately specify advanced teaching practices and the characteristics of teachers (the knowledge, skills, dispositions, and judg- ments) that enable them to carry out advanced practice? Does it do so in a manner that supports the development of a well-aligned test? a. What processes were used to identify the knowledge, skills, dispositions, and judgments that characterize accomplished teachers? Was the process for establishing the descriptions of these characteristics thoughtful, thor- ough, and adequately justified? Who was involved in the process? To what extent do the participants represent different perspectives on teaching? b. Are the identified knowledge, skills, dispositions, and judgments presented in a way that is clear, accurate, reasonable, and complete? What evidence is there that they are relevant to performance? c. Do the knowledge, skills, dispositions, and judgments that were identified reflect current thinking in the specific field? What is the process for revisiting and refreshing the descriptions of expectations in each field? d. Are the knowledge, skills, dispositions, and judgments, as well as the teach- ing practices they imply, effective for all groups of students, regardless of their race and ethnicities, socioeconomic status, and native language status? Question 2: To what extent do the assessments associated with the certi- fication program for accomplished teachers reliably measure the specified knowledge, skills, dispositions, and judgments of certification candidates and support valid interpretations of the results? To what extent are the per- formance standards for the assessments and the process for setting them justifiable and reasonable? a. To what extent does the entire assessment process (including the tasks, scoring rubrics, and scoring mechanisms) yield results that reflect the speci- fied knowledge, skills, dispositions, and judgments? b. Is the passing score reasonable? What process was used for establishing the passing score? How is the passing score justified? To what extent do pass rates differ for various groups of candidates, and are such differences reflective of bias in the test? c. To what extent do the scores reflect teacher quality? What evidence is available that board-certified teachers actually practice in ways that are con- sistent with the knowledge, skills, dispositions, and judgments they demon- strate through the assessment process? Do knowledgeable observers find them to be better teachers than individuals who failed when they attempted to earn board certification?
THE EVALUATION FRAMEWORK AND COLLECTION OF DATA 29 Question 3: To what extent do teachers participate in the program? a. How many teachers apply each year for board certification? Have there been changes in application rates over time? How do application rates compare across states and districts? What are the characteristics of teachers who apply compared with those who do not? What are the characteristics of teachers who successfully earn board certification compared with those who do not? b. Why do teachers choose to participate or not? What do various agencies (the board, states, school districts, teachers unions, etc.) do to encourage participation? How do these actions influence teachersâ attitudes toward certification and participation in the process? Question 4: To what extent does the advanced-level certification program identify teachers who are effective at producing positive student outcomes, such as learning, motivation, school engagement, breadth of achievement, educational attainment, attendance rates, and grade promotion? a. How does achievement compare for students taught by board-certified and nonboard-certified teachers, after controlling for other factors? Are the dif- ferences substantively meaningful? Do students taught by board-certified teachers have higher achievement or achievement gains than those taught by nonboard-certified teachers? Do student gains persist into the future? b. How do other student outcomes (such as motivation, breadth of achieve- ment, school engagement, attendance rates, promotion rates) compare for students taught by board-certified and nonboard-certified teachers? Question 5: To what extent do teachers improve their practices and the outcomes of their students by virtue of going through the advanced-level certification process? a. To what extent do teachers who go through the certification process improve their teaching practices and classroom climate, regardless of whether they become board certified? b. Do teachers who obtain board certification become more effective at in- creasing student achievement in ways that are evident in their studentsâ achievement scores? c. Do teachers have a greater impact on other student outcomes (e.g., higher student motivation, higher promotion rates) after they obtain board certifica- tion than they did before they were certified? Question 6: To what extent and in what ways are the career paths of both successful and unsuccessful candidates affected by their participation in the program? Continued
30 ASSESSING ACCOMPLISHED TEACHING BOX 2-2â Continued a. What are the typical career paths for teachers? Does the career path change for those who obtain advanced certification? What are the effects on the career paths of teachers who attempt to become certified but who are unsuccessful? b. Do departure rates differ for board-certified and nonboard-certified teachers with regard to leaving teaching (attrition), including those who leave class- room teaching for other jobs in schools (transition)? c. Does the program have any effects on teacher mobility within the teaching field? Does it encourage teacher mobility in ways that are beneficial for lower performing students or in ways that contribute to inequitiesâfor example, do board-certified teachers move out of urban areas to wealthy suburban districts? Question 7: Beyond its effects on candidates, to what extent and in what ways does the certification program have an impact on the field of teaching, the education system, or both? a. What are the effects of having one or more board-certified teachers in a school or district? b. Has the board-certification program had any effects on: â¢ the course content, methods of preparation, and assessments used in teacher education programs, or â¢ the content of and strategies used in inservice training and professional development for practicing teachers? c. Has the board-certification program had any effects on the applicant pool for teacher education programs? Since the board came into existence, have there been changes in the numbers of individuals entering teacher education programs or the characteristics of the applicants? d. Has the existence of board certification had an impact on the allocation of teachers across districts and schools? Has the program been a useful tool for increasing the numbers of accomplished teachers in high-needs schools? Question 8: To what extent does the advanced-level certification program accomplish its objectives in a cost-effective manner, relative to other ap- proaches intended to improve teacher quality? a. What are the benefits of the certification program? b. What are the costs associated with the certification program? c. What other approaches have been shown to bring about improvement in teacher quality? What are their costs and benefits?
2-1 Effective program to Recognition is Standards for certifiy advanced-level provided to advanced teachers is developed. teachers who practice are (1, 2) earn advanced- communicated to the field. level certification. Practices and Practices and standards are standards are Improved conditions (i.e., incorporated in Increased numbers of incorporated in opportunity for leadership preservice training. highly qualified inservice roles and for salary (7) individuals pursue training. (7) increases) increase job Teachers teaching. (7) satisfaction. (7) participate; advanced-level teachers are identified. (3) Certification process Board-certified reinforces use of Board-certified Board-certified teachers are Board-certified teachers mentor effective practice; teachers are effective at teachers remain other teachers. (7) board-certified assigned to high- improving student in teaching. (6) teachers improve their needs schools. effectiveness. (5) outcomes. (4) (7) Improvements are made in teaching and learning throughout education system. FIGURE 2-1 Hypothesized impacts of an advanced-level certification program for teachers. 31 2-1 Broadside
32 ASSESSING ACCOMPLISHED TEACHING Analysis Group (TAG) reports. The TAG reports consist of articles that summarize studies conducted as the assessment program was being devel- oped. They explore topics typically investigated during the development phase of a test, such as procedures for developing assessment tasks that evaluate the content standards, methods for scoring the assessment tasks and ways to increase the reliability of the scoring, determining the passing score for the assessment, and studies of adverse impact. Findings from these studies shaped the assessment, its scoring, and operational procedures. As such, these 128 studies were primarily useful for helping us to understand the history of the development of the assessment from a psychometric per- spective and to address the first two questions in our framework regarding the psychometric soundness of the assessments and the processes used to produce the assessments. NBPTS Grant-Funded Studies The NBPTS bibliography included approximately 18 studies that are the result of a special board research endeavor supported through private grant funding. The board initiated this project in order to subject the pro- gram to external scrutiny and allow researchers to evaluate various claims that had been made about the program, both supportive and unsupportive (personal conversation, Ann Harman, former director of research with the NBPTS). In January 2002, the board launched this project by convening a special biddersâ conference for prospective grant recipients. During the conference, NBPTS staff members identified the areas in which they sought investigation, including the impact of board-certified teachers on student achievement, the impact on low-performing schools, leadership activities of board-certified teachers, standards-based professional development, adverse impact associated with the assessment, the NBPTS digital edge program, and psychometric/technical issues. Subsequent to the conference, potential researchers submitted proposals. To ensure objectivity, the board arranged for researchers from the RAND Corporation to review and rate the pro- posals and make the funding decisions. The schedule called for completed reports to be submitted within a three-year period. The peer review process established by the board for the completed reports is not documented, but we learned about it from NBPTS staff mem- bers. Accordingly, researchers were told that their reports would undergo a peer review process, but it was up to the researcher to decide on the nature of the review. The researcher could obtain a peer review prior to submitting â The TAG was a group formed to advise the NBPTS on the development of the assessment. It was based at the University of North Carolina, Greensboro, and headed by Richard Jaeger, Lloyd Bond, and John Hattie (see Chapter 3 for additional details).
THE EVALUATION FRAMEWORK AND COLLECTION OF DATA 33 the final paper to the board and submit it as a reviewed product, or the researcher could submit the report as unreviewed, and NBPTS staff would handle the review. Our understanding of this process is that it was neither rigorous nor standardized. We raise this point because we were disap- pointed in the quality of many of the grant-funded studies. We think that a more rigorous peer review process, and perhaps additional oversight during the course of each study, may have led to a higher quality body of work. Committee Criteria for Reviewing the Evidence A comparison of the studies on the NBPTS bibliography and the list we generated identified approximately 44 articles that reported on empiri- cal research related to our framework. The list includes studies that used a variety of quantitative and qualitative methods. To guide our review of the studies, we agreed on a set of standards for judging the quality and validity of the findings. Numerous texts describe the characteristics of sound research and the factors that can jeopardize the integrity of research findings. We relied in particular on the guiding principles identified by the National Academiesâ Committee on Scientific Principles for Education Research (National Re- search Council, 2002, available at http://www.nap.edu/catalog.php?record_ id=10236#toc) and in the Standards for Reporting on Empirical Social Science Research in AERA Publications (American Educational Research Association, 2006). While the committee did not attempt to develop its own comprehensive list, we did identify several criteria that were particularly relevant to the body of evidence available regarding the national board program: â¢ The design of the studyâthe framing of the question(s) to be inves- tigated, the method of data collection, and the procedures for ana- lyzing the dataâis described clearly and in sufficient detail to allow the reader to form independent judgments about its adequacy. â¢ The methodology is a logical and defensible approach to answering the specified research question(s) and is carried out correctly. â¢ The approach to classifying the phenomena to be measured in quantifiable terms is adequately explained and justifiable. â¢ The identification of samples to be studied, as well as the sample selection procedures, are described in detail. They are appropriate to the research questions being addressed and adequately relate to the conclusions drawn. â¢ The effects of attrition or nonresponse of subjects are addressed in the findings and conclusions.
34 ASSESSING ACCOMPLISHED TEACHING â¢ The variables measured are appropriate for the specific research question(s) and are measured in a systematic and reliable manner. â¢ The findings are fully described and conclusions are justifiable, given the methodology and limitations of the study. In general, we confined our attention to studies that follow the accepted practices and standards for the type of research attempted in the study. We expected studies to be conducted in a systematic fashion, regardless of whether the methods were qualitative or quantitative, and to be fully documented. In total, this body of research was of mixed utility because many of the studies had technical shortcomings that made the findings dif- ficult to interpret. We made use of as many of them as we could and tried to balance our level of confidence in the findings with the methodological shortcomings and the extent of corroborating evidence. In all, we identified 25 studies that met our criteria and were relevant to our evaluation frame- work. Appendix A summarizes the studies we predominantly relied upon. This evidence base was decidedly uneven with respect to the eight questions in our evaluation framework. Nearly half of the studies (10) ad- dressed one aspect of Question 4 (comparisons of student achievement for board-certified and nonboard-certified teachers), and there was little or no evidence available for some of the other questions. To help fill in the gaps in the literature base and to help us fully understand the existing evidence, we arranged to collect our own data and conduct our own analyses. Information Collected by the Committee In all, the committee held six meetings, of which four included time for presentations intended to focus on specific aspects of the evaluation framework. We also arranged for a number of meetings outside committee meetings and additional analyses reported in four papers (Ladd, Sass, and Harris, 2007; McCaffrey and Rivkin, 2007; Perda, 2007; Russell, Putka, and Waters, 2007). To understand the history of the program, its development, and its current operation, we arranged to meet with the following people who currently worked with the NBPTS or its contractor, ETS, or had previously been involved with the program. We obtained information from these in- dividuals both by inviting them to make presentations at our meetings and by conducting structured interviews with them outside the meetings. The individuals we consulted are listed below. Current NBPTS staff members: â¢ Joseph Aguerrebere, president and chief executive officer â¢ Joan Auchter, vice president, standards and assessment
THE EVALUATION FRAMEWORK AND COLLECTION OF DATA 35 â¢ Mary E. Dilworth, vice president, higher education initiatives and research ETS staff members: â¢ Drew Gitomer, distinguished research scientist â¢ Mari Pearlman, senior vice president; involved with NPBTS devel- opment from the outset â¢ Steve Schreiner, director of scoring for NBPTS assessments Former NBPTS staff members: â¢ Chuck Cascio, ETS; former director of test development at NBPTS â¢ Ann Harman, Harman and Associates; former NBPTS director of research â¢ Jim Kelly, retired; first president and chief executive officer of NBPTS â¢ David Mandel, Carnegie-IAS Commission on Mathematics and Science Education; former vice president for policy development at NBPTS â¢ Sally Mernissi, deceased, January 2006; former vice president and corporate secretary at NBPTS Representatives from stakeholder groups: â¢ Joan Baratz-Snowden, American Federation of Teachers; former vice president of assessment and research at NBPTS â¢ Joshua Boots, American Board for Certification of Teacher Excel- lence (ABCTE) â¢ Emerson Elliott, National Council for Accreditation of Teacher Education â¢ Mary Futrell, George Washington University; former president of the National Education Association and member of the original board of directors â¢ Kathy Madigan, formerly with ABCTE NBPTS researchers and consultants: â¢ Lloyd Bond, Carnegie Foundation; former co-director of the Tech- nical Analysis Group â¢ Lee Shulman, Carnegie Foundation; former director of the Teacher Assessment Project at Stanford â¢ Gary Sykes, Michigan State University; former doctoral student of Lee Shulman and consultant to the NBPTS board of directors â¢ Suzanne Wilson, Michigan State University; former doctoral student of Lee Shulman and consultant to the NBPTS board of directors
36 ASSESSING ACCOMPLISHED TEACHING We also sought information and insights from teachers with a variety of experiences related to the NBPTS program. The National Research Council has a standing committee of teachers, called the Teacher Advisory Council (TAC). The council includes 11 high school, middle school, and elementary teachers of reading, mathematics, and science who have been recognized for their exceptional work. This group serves as an advisory panel and resource for committees of the National Academies, to help make sure that the perspectives of teachers at the top of their field are taken into account in the conduct of education-related projects and to improve the usefulness, relevance, and communication of research-based findings. At the time, four of the TAC members were board certified. We attended their March 2006 meeting to learn more about their perceptions of the NBPTS and specifically to follow up on some of the findings reported in the research with regard to the influences of board-certified teachers in schools and school systems. In preparation for this discussion, we distributed a set of questions to the TAC members in advance of their meeting, which inquired about their reasons for deciding to pursue (or not pursue) board certification, their impressions of the program, and, for those who were board certified, any ways that it has impacted their practices or career. We then conducted a two-hour structured discussion to hear their responses to these questions. A segment of our third committee meeting (June 2006) was also devoted to hearing firsthand accounts from teachers and teacher educators. The goal of this panel was to learn more about the ways in which the NBPTS has influenced teacher education and professional development. Mary Futrell, dean of education and former member of the NBPTS board of directors, and Maxine Freund, professor of special education, both with the George Washington University School of Education, discussed their perceptions of ways that the NBPTS has affected teacher training at their institutions. (Freundâs work in developing mentoring programs for NBPTS applicants is documented in Freund, Russell, and Kavulic, 2005.) In addition, we invited two board-certified teachers with documented involvement in professional development activities in their school systems (see Cohen and Rice, 2005): Sara Eisenhardt, an elementary teacher in Cincinnati, Ohio, and Carol Ma- tern, employed with the Indianapolis public schools and an adjunct faculty member with Indiana University-Purdue University Indianapolis. The committee focused considerable attention on the body of research related to student outcomes. We listened to presentations by authors of 6 of the 12 studiesâLinda Cavaluzzo, Dan Goldhaber, Douglas Harris and Tim Sass, Thomas Kane and Jon Fullerton, Helen Ladd, and William Sanders. Together, the findings from these studies presented a complex set of results. â More information about this group can be found at http://ww7.nationalacademies. org/tac/.
THE EVALUATION FRAMEWORK AND COLLECTION OF DATA 37 We asked five researchers (Henry Braun, Paul Holland, Daniel McCaffrey, Steve Raudenbush, and Steven Rivkin) to review these articles and assist the committee in synthesizing the results. McCaffrey and Rivkin assisted the committee by identifying additional analyses that would help clarify the results, and Harris, Sass, and Ladd conducted these studies. We sought assistance in evaluating the psychometric qualities of the NBPTS assessment, and Teresa Russell helped with this aspect of our work. We also obtained a data set from the NBPTS and had David Perda con- duct analyses that helped us understand who participates in the program and how they compare with other teachers on a national basis. Finally, we heard presentations by Carol Cohen and Jennifer King Rice, who discussed their work on estimating the costs of support programs for teachers going through the NBPTS process.