Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 154
Assessing Accomplished Teaching: Advanced-Level Certification Programs 7 The Impact of Board-Certified Teachers on Student Outcomes The National Board for Professional Teaching Standards (NBPTS) set out to accomplish a number of broad goals, all intended to transform the teaching profession in this country. By reshaping the teaching field, expanding the opportunities available to teachers, and articulating the standards for accomplished teaching, the national board envisioned having a significant impact on the quality of teachers and teaching and, consequently, on student learning (National Board for Professional Teaching Standards, 1991). The board emphasized that ultimately all of these goals were directed at improving student learning. In keeping with this objective, Congress specifically asked that the committee’s evaluation consider the impact of national board certification on student outcomes. Accordingly, our evaluation framework asks: Question 4: To what extent does the advanced-level certification program identify teachers who are effective at producing positive student outcomes, such as learning, motivation, school engagement, breadth of achievement, educational attainment, attendance, and grade promotion? Figure 2-1 shows how this aspect of the evaluation fits within the committee’s framework, displaying our model of the ways a certification program for accomplished teachers could affect the teaching profession and the way our evaluation questions map onto this model. We identified two
OCR for page 155
Assessing Accomplished Teaching: Advanced-Level Certification Programs issues to investigate and to provide evidence of the extent to which board certification has an impact on student learning. Specifically: How does achievement compare for students taught by board-certified and nonboard-certified teachers, after controlling for other factors? Are the differences meaningful? Do students taught by board-certified teachers have higher achievement or achievement gains than those taught by nonboard-certified teachers? Do student gains persist into the future? How do other student outcomes (such as motivation, breadth of achievement, attendance rates, promotion rates) compare for students taught by board-certified and nonboard-certified teachers? The majority of studies that estimated effects of national board certification focus on student achievement (our Subquestion a). We located only one study that addressed other outcomes such as those listed in Subquestion b. The bulk of this chapter therefore addresses the findings from studies of student achievement. At the end of the chapter, we discuss other types of student outcome measures and propose research that should be considered in future investigations. This chapter has five sections. In the first, we discuss issues related to using scores on standardized achievement tests as the outcome variable. The studies use sophisticated statistical methods, and in the second section, we discuss issues that bear on this kind of research and provide explanations of some of the technical terminology. The third section reviews studies of the impact of board certification on student achievement and our analyses of this topic. The fourth section describes the only study we located that examined student outcomes other than performance on achievement tests. The chapter closes with a discussion of conclusions that can be drawn from this evidence base and the types of research needed to fill gaps in what is known. USING ACHIEVEMENT TEST SCORES AS THE OUTCOME VARIABLE Nearly all the research discussed in this chapter uses student scores on standardized achievement tests as the measure of the impact of board certification. Using test scores in this way has a long history in research, and in the current federal accountability system established under the No Child Left Behind Act, test scores are the primary indicator of whether schools are making “adequate yearly progress.” However, test scores as measures of achievement are not universally accepted as measures of student learning. Committee members from differ-
OCR for page 156
Assessing Accomplished Teaching: Advanced-Level Certification Programs ent disciplines had differing views about test scores as measures of learning outcomes and the types of inferences that are appropriate from the results. For example, economists routinely use achievement test scores as indicators of student learning, understanding that the scores are not perfect as indicators of learning but are the best quantitative measures available for statistical analyses (e.g., Rivkin, Hanushek, and Kain, 2005). Achievement test scores, in particular, have been found to be correlated with other outcomes, such as high school completion, college enrollment and completion, job status, future earnings, and other measures of success (e.g., Carnevale, Fry, and Lowell, 2001; Chiswick, Lee, and Miller, 2002; Jencks et al., 1979; McIntosh and Vignoles, 2001; Sewell, Hauser, and Featherman, 1976; Tyler, Murnane, and Willett, 2000). Psychometricians, who are trained in the processes and methods for developing tests, focus on whether test scores are valid measures of learning and whether interpretations drawn about them are appropriate. In the present context, achievement tests have not been developed for the purpose of evaluating the effectiveness of teachers’ instructional practices. Tests developed specifically to assess teaching could look different from those used for measuring student achievement. Furthermore, among teachers and teacher educators, test scores are viewed as, at best, only correlated with student learning. Teachers are familiar with curriculum and state and local standards and how tests relate to them, and they are aware that tests capture only a portion of what they teach and what students learn. They know that exceptional students can perform poorly on tests and low-performing students can do well on tests. They know that tests vary in the extent to which they assess critical thinking, problem solving, and higher order thinking skills. Many of the skills that the national board requires teachers to demonstrate are not reflected by what is evaluated on standardized achievement tests. For example, to become board certified in the middle childhood generalist area, teachers need to demonstrate that they can establish a caring and stimulating learning environment, that they respect individual differences, that they use a rich and varied collection of materials in their teaching, that they provide multiple paths to learning, and that they provide students with situations when they can apply what they have learned (National Board for Professional Teaching Standards, 2001c). All this is in addition to demonstrating their understanding of the subject matter, the curriculum, and pedagogy. In evaluating the body of research covered in this chapter, the committee was cognizant of the limitations as well as the different disciplinary perspectives. Throughout this report, we have attempted to portray a balanced perspective of this research and the use of student test scores for these purposes.
OCR for page 157
Assessing Accomplished Teaching: Advanced-Level Certification Programs METHODOLOGICAL ISSUES The question of whether board-certified teachers are more effective than nonboard-certified teachers is formally a question of whether a student who had a board-certified teacher learns more than if he or she had a non-board-certified teacher. Asked in this way, the question cannot be answered because the same student cannot be taught at the same time by a board-certified teacher and a nonboard-certified teacher. This problem—of not being able to observe results as they would occur under the “counterfactual” situation—commonly arises in studying the effectiveness of interventions. In medical research, for example, individuals cannot simultaneously take a pharmaceutical treatment and take the placebo treatment. Researchers have devised a variety of methods for addressing the problem of not having a counterfactual situation for comparison purposes. These methods focus on creating a comparison group that is similar with respect to all relevant characteristics to the treatment group, allowing the researchers to infer that posttreatment differences are attributable to the treatment and not to other differences between the groups. The most powerful way to create equivalent groups is with random assignment of subjects to treatment and comparison groups. However, random assignment of students to board-certified teachers has not often been done (although we review the findings from one such study below) because it is difficult to accomplish in real-life situations. Three issues arise when random assignment is not used: (1) if students are not randomly assigned to teachers, they may differ systematically; for example, board-certified teachers may have students with higher than average achievement and motivation to learn; (2) if teachers are not randomly assigned to schools, some of the observed achievement differences between board-certified and nonboard-certified teachers may be attributable to differences in their schools; and (3) if teachers are not randomly assigned to pursue board certification, teachers who do and do not receive certification may differ systematically. We explore these three issues below and discuss how various studies have addressed them. Nonrandom Assignment of Students to Teachers If students are not randomly assigned to teachers, the effects of board certification may be either underestimated or overestimated to the extent that teachers are systematically assigned students who are below or above average. For example, if board-certified teachers are typically assigned above-average students, comparing test scores may suggest board-certified teachers are more effective than nonboard-certified teachers when in fact all that is true is that the students they have been assigned have higher
OCR for page 158
Assessing Accomplished Teaching: Advanced-Level Certification Programs achievement levels (and would have them even if taught by a nonboard-certified teacher). The studies we reviewed used two strategies to estimate the effects of board certification in the absence of random assignment of students to teachers. Some used a covariate adjustment approach, which uses statistical controls for measured preexisting characteristics of students, such as gender, race/ethnicity, economic circumstances, and prior achievement.1 A shortcoming of this approach, however, it is that it is limited by the characteristics that have been measured and are available (e.g., in the data set). That is, if measures of characteristics such as gender, race, or economic circumstances are not available, no statistical control of them can be implemented. Thus, covariate adjustment approaches might not control for all preexisting conditions. Even prior achievement may not fully account for all preexisting conditions because the tests are not perfectly reliable and because other circumstances may change that may affect the predictive power of prior achievement. Other researchers used student fixed effects models to take prior conditions into account. Theoretically, student fixed effects models control for all the time-invariant characteristics of students (e.g., characteristics that are stable over time), whether or not direct measures of those characteristics are available. They do so by subtracting a student’s mean value over time from each of his or her own data points. What remains, then, is the student’s trajectory over time (up or down), which can be related to the student’s changing experiences (for example, whether she or he has a board-certified teacher in one year and not the next). Student fixed effects models operate by allowing each student to serve as his or her own control. In the present context, the model estimates the average achievement for each student and determines the extent to which each student’s achievement in a given year deviates from this expected average achievement.2 These deviations can then be compared when students are taught by board-certified teachers and when they are taught by nonboard-certified teachers. 1 Controls for prior achievement are particularly important in this approach. The studies we reviewed controlled for prior achievement in two different ways. Some included prior achievement as a covariate, and others subtracted prior achievement from subsequent achievement so that the outcome reflects a gain score rather than an achievement level. These approaches make different assumptions about the relation between pretests and posttests (gain scores assume that pretests and posttests have a one-to-one relationship; covariate adjustments do not make this assumption), and both approaches are widely used with neither clearly preferred over the other. Several studies presented results based on both approaches. 2 This is typically done by creating an “indicator” variable (or “dummy variable”) for each student in the analysis. Further information about this kind of “dummy coding” is provided in Pedhazur (1982, pp. 274-279). Mathematically, this is the same as subtracting each student’s mean over time from the achievement estimate for the current year.
OCR for page 159
Assessing Accomplished Teaching: Advanced-Level Certification Programs Student fixed effects models offer more rigorous adjustments for preexisting conditions than covariance adjustment models because they account for unobserved characteristics (whereas covariance adjustments take into account only observed, measured variables). However, they demand data on students from multiple time points, and they do not take account of unobserved conditions that may change over time. Also, fixed-effects models rely on information from students whose experiences change over time; a student who always had a board-certified teacher, or who never had a board-certified teacher, would contribute no information to the fixed-effects model.3 Nonrandom Assignment of Teachers to Schools Schools may differ in their ability to attract or retain board-certified teachers. For example, board-certified teachers may be more often found in schools serving advantaged communities. If so, underlying differences in school characteristics may be confounded with differences arising because of board certification. Studies we reviewed used two methods for taking into account nonrandom assignment of teachers to schools. Some used school-level covariates to adjust for preexisting conditions likely to be related to achievement, such as the percentage of students participating in the free and reduced-price lunch program, the percentage of minority students, and prior information about student achievement at the school. However, as discussed above, covariate adjustment approaches can control only for preexisting differences in characteristics that have been measured and are available. Other studies included school fixed effects, which are analogous to the student fixed effects model discussed above. School fixed effects models control for all stable characteristics of the school, whether or not direct measures of those characteristics are available. They do so by subtracting each school’s mean over time from each school’s yearly mean, thus creating a school’s trajectory over time. As with student fixed effects models, this allows each school to serve as its own control and, in the present context, measures the extent to which the school’s achievement in a given year deviates from its average over time.4 For the studies discussed in this chapter, the school fixed 3 There is also evidence that student fixed effects models do not rule out biases that may result from unobserved time-varying characteristics of students that may affect the assignment of students to teachers (Rothstein, 2008). 4 As with student fixed effects models, this is typically done by creating an “indicator” variable (or “dummy variable”) for each school in the analysis. Further information about this kind of “dummy coding” is provided in Pedhazur (1982, pp. 274-279). Mathematically, this is the same as subtracting each school’s mean over time from the school’s achievement estimate for the current year.
OCR for page 160
Assessing Accomplished Teaching: Advanced-Level Certification Programs effects model essentially compares board-certified and nonboard-certified teachers in the same school. That is, they determine the average achievement trajectory for each school and compare deviations from this trajectory for board-certified teachers and nonboard-certified teachers. School fixed-effects models share the same limitations as student fixed effects models. They control for school characteristics that are stable over time, under the assumption that these characteristics will have the same effects on performance in one year as in a subsequent year. As with student fixed effects models, they cannot control for characteristics that vary over time, such as when the school district boundaries are altered so that the composition of the student body changes markedly. School fixed effects models require at least two years of data for each school, and preferably three or more years of data, and therefore rely on the existence of large-scale longitudinal data sets. Nonrandom Assignment of Teachers to National Board-Certification Status The decision to pursue board certification is voluntary, and, as a result, teachers are not randomly assigned to become board certified. Teachers thus “self-select” into board-certified and nonboard-certified status. Simply comparing achievement of students of board-certified and nonboard-certified teachers can mix differences related to board certification and differences related to characteristics of teachers who chose to become board certified. The studies we reviewed used two methods for controlling for these preexisting differences among teachers. Some relied on covariate adjustment procedures using teacher-level covariates, such as years of experience, level of education, and teacher-licensure test scores to control for prior differences among teachers. Again, the downside to the use of covariance adjustment is that it relies on characteristics for which measures are available. Others used teacher fixed effects, which are analogous to student fixed effects and school fixed effects models described above. Teacher fixed effects models use teachers as their own controls. They estimate the average growth trajectory for each teacher’s students (e.g., the average across all students taught by a given teacher) and analyze the deviations from this average.5 Teacher fixed effects models can be used to examine whether these deviations are associated with the teacher’s board-certification status. 5 As with models that use student fixed effects or school fixed effects, this is typically done by creating an “indicator” variable (or “dummy variable”) for each teacher in the analysis. Further information about this kind of “dummy coding” is provided in Pedhazur (1982, pp. 274-279). Mathematically, this is the same as subtracting each teacher’s mean over time from the teacher’s achievement estimate for the current year.
OCR for page 161
Assessing Accomplished Teaching: Advanced-Level Certification Programs These models share the shortcomings noted above for school and student fixed effects models. They require at least two years of data for each teacher and, thus, rely on the existence of large-scale longitudinal data sets. Most of the studies we reviewed examined whether board certification distinguished more effective from less effective teachers (often referred to as a signaling effect). This question is important because a major goal of the program is to retain the most effective teachers in the teaching field and many states offer salary increases to teachers who become board certified. This question can be addressed by knowing each teacher’s board-certification status. Several studies we reviewed also attempted to determine whether board certification makes teachers more effective, an issue that economists refer to as a human capital effect. Addressing this question requires that the dataset contain information on teachers before and after they participated in the board certification process, and organized in such a way that the timing of earning board certification can be determined. Four studies had the needed information on teachers before and after they participated in the board-certification process, making it possible to assess whether going through the certification process increases a teacher’s effectiveness. Nesting of Students Within Classrooms Conventionally, students are grouped in classrooms, and classes are taught by a single teacher. Researchers refer to this structure as nesting or clustering of students within classrooms. This clustering needs to be considered in designing the research approaches because students in a class are generally more like each other than students in different classes. Students in a classroom share a common learning environment and a common teacher, which causes their test scores to be somewhat positively correlated. If these correlations are not taken into account in the statistical models, estimates of teacher effects will seem to be more precise than they really are, leading to false conclusions about statistical significance. Researchers handle this in different ways. Some create statistical models that reflect the nesting of students in classrooms, such as hierarchical or multilevel models. Others use a statistical correction procedure. This procedure corrects for the fact that the estimates of teacher effects are overly precise. The procedure estimates “robust standard errors,” resulting in correct estimates of statistical significance. When the nesting is not addressed, tests of statistical significance are biased such that effects may be found to be statistically significant when in fact they are not.
OCR for page 162
Assessing Accomplished Teaching: Advanced-Level Certification Programs STUDIES OF STUDENT PERFORMANCE ON ACHIEVEMENT TESTS Ten studies that we reviewed used student achievement test data to evaluate the effects of board-certified teachers on test scores. Some studies found positive effects of board certification, and some found no effects. The findings are sensitive to model specification, how comparison teachers are identified, the timing of the comparison—before certification, after certification, or during the certification process—and the nature of the test score used as the outcome. In reviewing these studies, we attempted to get an overall sense of the evidence about the relationship between board certification and student learning and the extent to which the findings are consistent across studies. Because our initial review revealed some discrepancies in the findings, we looked closely at the methodologies used by each researcher to consider the extent to which methodological choices contributed to differences in findings. We solicited the assistance of two researchers (Daniel McCaffrey and Steven Rivkin) to help us with this review. We asked our reviewers to help sort out the methodological differences, summarize the findings, and identify unanswered questions. The goal of their work was to identify a set of analyses that would be appropriate for the various data sets and would help to disentangle the methodological issues from the findings. We also had two teams of researchers carry out the analyses: one team (Timothy Sass and Douglas Harris) for the Florida data set, and the other team (Helen Ladd and associates) for the North Carolina data set. The next section highlights the main findings from the reviewed studies to give readers a sense of whether findings are statistically significant and consistent across analyses and statistical models. The reader is referred to Appendix A for additional details about the findings, to McCaffrey and Rivkin (2007) for a thorough critique of the studies, and to the original papers for a complete presentation of the approaches and findings. Review of Existing Studies Of the 10 studies that we reviewed, three relied on relatively small samples. McColskey et al. (2005) analyzed data for 25 board-certified teachers in North Carolina; Stone’s (2002) study consisted of data for 16 board-certified teachers in Tennessee; and Vandervoot, Amrein-Beardsley, and Berliner (2004) focused on data for 35 self-selected board-certified teachers in Arizona. The committee judged that these small sample sizes combined with other methodological limitations made it difficult to draw conclusions from the studies. We focus below on the seven studies that relied on larger
OCR for page 163
Assessing Accomplished Teaching: Advanced-Level Certification Programs samples of teachers and more sophisticated analytic approaches. Table 7-1 provides a brief overview of each. The studies used a range of approaches to control for differences among students, teachers, and schools and for dealing with the issue of students nested within classrooms. We first discuss the study that used random assignment and then discuss the studies that used statistical controls to compensate for the fact that they did not use random assignment. Most of these studies were very comprehensive, reporting results for numerous comparisons based on a variety of statistical models. In this review, we attempt to give the reader a general sense of the findings by characterizing the effects in terms of statistical significance. For ease of presentation, we do not specify the exact level of statistical significance (i.e., the p-values reported by the authors), but in all cases, effects we refer to as “statistically significant” met a criteria of p < .05. Additional details about the studies and summaries of effect sizes are provided in Appendix A. Random Assignment of Teachers to Classrooms Cantrell et al. (2007) is the only study that used a form of random assignment. Using data for students and teachers in the Los Angeles Unified School District, the researchers assigned teachers randomly to classrooms of students. To conduct the assignment, researchers worked with the NBPTS to identify applicants for board certification, some of whom had earned board certification and others who had not. Each applicant was matched with a nonapplicant comparison teacher in the same school and grade. Classrooms were then randomly assigned to teachers. Two additional samples of board-certified and nonboard-certified teachers were identified to allow the researchers to study the effects of random and nonrandom assignment on the results. The researchers classified applicants as passed, failed, or withdrawn and compared achievement test results for the students taught by each of these groups and students taught by the nonapplicants. The findings for the three groups indicate that applicants who received board certification were more effective than those who applied but failed, and the differences were statistically significant. There were small differences in effectiveness between board-certified teachers and nonapplicants, but these differences were not statistically significant. The results for the nonexperimental sample showed the same patterns, but the effect sizes were much smaller. Studies Using Statistical Adjustments to Account for Nonrandom Assignment Six of the seven studies used fixed effects models and/or covariates to adjust for differences in school, teacher, and student characteristics
OCR for page 164
Assessing Accomplished Teaching: Advanced-Level Certification Programs TABLE 7-1 Studies Examining the Relationship Between Board Certification and Student Achievement Study Grades/Content Area(s) Years State/District Cantrell et al. (2007) 3rd-5th; reading, math 2003-2005 Los Angeles Cavaluzzo (2004) 9th-10th; math 2000-2003 Miami–Dade County Clotfelter, Ladd, and Vigdor (2006) 3rd-5th; reading, math 1994-2004 NC Clotfelter, Ladd, and Vigdor (2007) 5th; reading, math 1999-2000 NC
OCR for page 171
Assessing Accomplished Teaching: Advanced-Level Certification Programs count for the clustering of students within classroom. Two other models accounted for this clustering by including random effects for teachers. Random effects capture systematic differences in performance that are shared by students taught by a particular teacher but that are unrelated to any measured teacher or student characteristic already in the model. This effect allowed the researchers to compare the variability of teachers within a specific classification (e.g., variability among board-certified teachers) with the variability of teachers across classifications (e.g., variability between board-certified teachers and nonapplicants). The findings varied considerably depending on the model. The models that replicated those used by Cavaluzzo and by Goldhaber et al. tended to find statistically significant effects, particularly in reading, whereas the models that accounted for clustering by including random effects uncovered few statistically significant effects. The implication is that accounting for classroom clustering may be an important feature of the models for estimating variances correctly. However, the size of the effects revealed by Sanders, Ashton, and Wright was not dissimilar to those reported by other researchers. Sanders, Ashton, and Wright’s disaggregation by grade level reduced sample sizes, reducing power and making it more difficult to identify significant effects, irrespective of the clustering issue. Synthesis of the Research Findings The literature review above was intended to describe findings from research on the relationship between national board certification and student test scores. Studies that compared test score gains for students of teachers who were and were not successful in earning board certification consistently found statistically significant differences between the two groups. Results from comparisons of test score gains for students of board-certified teachers and nonapplicants were less consistent. The studies differed along many dimensions that affect attempts to draw conclusions from them. They used different samples of teachers and classified teachers differently into NBPTS participation groups. Consequently, in some studies the comparison group was teachers who had failed to achieve certification, and in some studies it was all nonboard-certified teachers, which included nonparticipating teachers as well as teachers who had failed to obtain board certification. The studies used different characteristics of students, teachers, classes, and schools as explanatory variables. The studies also differed in the way they measured test score differences, some using gains and some using the current test score with the previous score(s) as a covariate. Given the extent of differences among studies, it is impossible to assess which findings are robust and which are consequences of methodological choices.
OCR for page 172
Assessing Accomplished Teaching: Advanced-Level Certification Programs As described earlier, the committee had two teams of researchers carry out a set of analyses appropriate for the various data sets and to help disentangle the methodological issues from the findings. The findings from these supplemental analyses are described below. Results of Supplemental Analyses with Florida and North Carolina Data Sets The two teams of researchers used a common set of specifications to analyze Florida and North Carolina data. The analyses used data for the 2000-2001 through 2003-2004 school years and examined reading and math performance for fourth and fifth graders, the two grades common between the two data sets. School, teacher, classroom, and student characteristics that were common to both data sets were included in the analyses. Two general classification schemas were used. They first classified teachers into two groups—ever-board-certified and not-ever-board-certified—to examine the signaling effects associated with board certification. They then classified teachers according to the stage of their participation: (1) current board-certified teacher; (2) current applicant for board certification; (3) future board-certified teacher (that is, not currently board certified but would attain certification in the future); and (4) no participation with the NBPTS. This analysis investigated the extent to which teachers improved test scores as they progressed through the certification process. The researchers ran six alternative models that reflected the methodological variations observed in the studies we reviewed. Two strategies were used to estimate score increases (gain score model and covariate model) and two methods were used to handle preexisting group differences (student fixed effects and school fixed effects). The results are shown in Table 7-2. The results indicate that the findings are more sensitive to context (i.e., the state) than to model specification. Consistent positive and statistically significant effects for board-certified teachers in North Carolina are evident for both reading and mathematics. The magnitude of the effects varied with model specification, but the sign and significance did not. In Florida, the effects on reading achievement were positive and statistically significant for all but one model, although the effects were smaller than in North Carolina. For mathematics, the Florida estimates were small and not statistically significant. Results for the model that we judged to be strongest appear in columns 7 and 8 of Table 7-2. This model used the gain score as the outcome measure and estimated both student and school fixed effects. The results indicate that, compared with other teachers, board-certified teachers in
OCR for page 173
Assessing Accomplished Teaching: Advanced-Level Certification Programs North Carolina raise test scores about 7 percent of a standard deviation7 more in math and 4 percent of a standard deviation more in reading. In Florida, board certification is associated with a smaller increase of about 1 percent of a standard deviation in mathematics and about 2 percent of a standard deviation in reading. The coefficients for Florida were not statistically significant. Table 7-3 allows comparison of the results from our analyses to those for prior studies that were based on a variety of different models (as described in the final column of the table). They generally show effect sizes in the same range as our commissioned analyses (roughly 5 to 7 percent of a standard deviation in mathematics and 4 to 6 percent of a standard deviation in reading), with the exception that our analyses of Florida tended to produce lower effect sizes. The bottom portion of Table 7-2 shows the relationship between board certification and student achievement at different stages of the certification process (the second schema). Several observations can be made about these results. For North Carolina, future board-certified teachers appear to be more effective before becoming board certified, raising test scores 5 percent more in math and 2 percent more in reading than other teachers. The decline in effectiveness reported in other studies is evident during the application year. After teachers earn board certification, they reach similar levels of effectiveness as prior to the process, raising test scores by 8 percent of a standard deviation in math and 4 percent of a standard deviation in reading. In Florida, the coefficients were smaller and most were not statistically significant. The exceptions were in math, in which teachers who were currently board certified raised test scores by 2 percent of a standard deviation more than other teachers. Also, in reading, teachers who would later become board certified appeared to raise their students’ test scores 4 percent of a standard deviation more than other teachers. Comparison of Teachers Who Passed and Teachers Who Failed The data sets we used for these analyses did not contain the information needed to compare the effectiveness of teachers who attempted to earn board certification and were successful and those who failed. The data needed for this comparison must be obtained directly from the NBPTS and requires some careful matching of state-level records with NBPTS records. Four sets of researchers worked with the NBPTS to obtain the needed data, 7 Reporting the results in terms of the percentage of a standard deviation is a way of placing the effects on the same metric. This allows researchers to compare effects from different models, analyses, and data sets.
OCR for page 174
Assessing Accomplished Teaching: Advanced-Level Certification Programs TABLE 7-2 Estimated Effects of National Board Certification on Mathematics and Reading Scores in Florida and North Carolina Math Reading Math Reading FLORIDA Schema 1 Ever certified .00 (.008)a .01b (.005) .01 (.006) .02 (.007) Schema 2 Certified in the future .00 (.012) .02 (.010) −.01 (.015) .05 (.014) Certified in current year −.01 (.013) .01 (.011) −.01 (.017) −.01 (.016) Certified in prior year .01 (.009) .01 (.007) .01 (.011) .01 (.009) NORTH CAROLINA Schema 1 Ever certified .05 (.005) .03 (.004) .07 (.006) .04 (.005) Schema 2 Certified in the future .05 (.010) .02 (.009) .06 (.015) .04 (.012) Certified in current year .03 (.013) .02 (.009) .04 (.014) .02 (.012) Certified in prior year .06 (.006) .03 (.004) .07 (.007) .04 (.006) Model gain score gain score Student fixed effects no yes School fixed effects no no aStandard errors appear in parentheses. bBold value are statistically significant, p < .05. SOURCE: McCaffrey and Rivkin (2007, Table 4). and their studies report effect sizes for each of these two groups of teachers. Examining the differences in effectiveness between these two groups provides a cleaner comparison because it eliminates any biases presented by the fact that teachers self-select to pursue board certification. Differences between the two groups thus speak to the ability of the assessment to identify more effective teachers, an issue often addressed in the context of criterion-related validity evidence as discussed in Chapter 5. Cantrell et al. (2007), Cavaluzzo (2004), Goldhaber and Anthony (2007), and Sanders, Ashton, and Wright (2005) permit this comparison. The results from all four of these studies show that teachers who successfully earned board certification were more effective than those who were
OCR for page 175
Assessing Accomplished Teaching: Advanced-Level Certification Programs Math Reading Math Reading Math Reading Math Reading .00 (.007) .01 (.006) .01 (.009) .02 (.008) .02 (.006) .02 (.005) .01 (.006) .02 (.005) −.00 (.012) .02 (.010) −.01 (.015) .04 (.015) .01 (.011) .02 (.010) −.00 (.011) .02 (.010) −.02 (.013) .01 (.011) .01 (.018) −.01 (.017) .01 (.013) .01 (.011) −.01 (.012) .01 (.010) .01 (.009) .01 (.007) .02 (.011) .01 (.010) .03 (.008) .02 (.006) .02 (.008) .02 (.007) .05 (.005) .02 (.004) .07 (.007) .04 (.006) .06 (.005) .03 (.004) .05 (.005) .02 (.004) .04 (.011) .02 (.009) .08 (.016) .05 (.012) .04 (.011) .02 (.009) .04 (.010) .01 (.008) .03 (.013) .02 (.009) .05 (.015) .02 (.013) .03 (.012) .03 (.009) .03 (.012) .02 (.009) .05 (.006) .02 (.005) .08 (.008) .04 (.007) .06 (.006) .04 (.004) .05 (.006) .03 (.004) gain score gain score lagged score lagged score no yes no no yes yes no yes unsuccessful; furthermore, teachers who were unsuccessful were less effective than teachers who did not attempt board certification. The magnitude of the effects differs from study to study, however, with the effect sizes reported by Cantrell et al. slightly larger than those reported in the other studies. Table 7-4 presents the effect sizes reported in each study when comparing teachers who passed with teachers who failed. The results from Cantrell et al. indicate that teachers who passed raised their students’ achievement about .20 of a standard deviation more in both math and reading than teachers who failed. In the other studies, the differences in effectiveness were about .10 in math (range of .09 to .13) and about .04 in reading (range of .03 to .05).
OCR for page 176
Assessing Accomplished Teaching: Advanced-Level Certification Programs TABLE 7-3 Summary of Effect Sizes from Different Models: Board Certified Versus Never Applied Study Math Reading Model Cavaluzzo .074 — School fixed effects, gain score Cantrell et al. .046 .060 Random assignment of students to teachers, lagged achievement Florida committee analyses .003 .015 Student, school fixed effects, gain score Goldhaber and Anthony .050 .040 Covariate, gain score North Carolina committee analyses .067 .038 Student, school fixed effects, gain score Sanders, Ashton, and Wrighta .070 .036 HLM,b lagged achievement Sanders, Ashton, and Wrightc .054 .058 HLM, teacher random effects, lagged achievement aData from Sanders, Ashton, and Wright, Table 3A, average effect sizes across grades for board certified versus never applied. bHLM = Hierarchical linear modeling. cData from Sanders, Ashton, and Wright, Table 3B, average effect sizes across grades for board certified versus never applied. TABLE 7-4 Summary of Effect Sizes: Successful National Board Applicants Versus Unsuccessful Applicants Study Math Reading Model Cantrell et al. .219 .194 Random assignment of students to teachers, lagged achievement Cavaluzzo .100 — School fixed effects, gain score Goldhaber and Anthony .090 .050 Student fixed effects, gain score Sanders, Ashton, and Wrighta .134 .038 HLM, b lagged achievement Sanders, Ashton, and Wrightc .102 .032 HLM, teacher random effects, lagged achievement aFrom Sanders, Ashton, and Wright, Table 3A, average effect sizes across grades for board certified versus failed. bHLM = Hierarchical linear modeling. cFrom Sanders, Ashton, and Wright, Table 3B, average effect sizes across grades for board certified versus failed.
OCR for page 177
Assessing Accomplished Teaching: Advanced-Level Certification Programs Interpreting the Effects The coefficients reported above, even when statistically significant, are small in an absolute sense. For example, in North Carolina an improvement of 8 percent of a standard deviation in math translates to roughly 1 point on the test that has a mean score of 150. To help evaluate the magnitude of these effects, we investigated the effect size for a hypothetical advanced-level certification process that relies solely on value-added estimates derived from student test scores. We refer to this hypothetical process as “pure value-added certification.” To our knowledge, no one has seriously proposed a pure value-added certification process, but there is substantial policy interest in value-added approaches, and many people have proposed approaches to certification or licensure that involve some role for value-added measures. A pure value-added certification process might seem appealing because, by design, it would certify the teachers who produce high value-added estimates, but at the same time it has serious drawbacks. There are four critical caveats to the implementation of a pure value-added certification process. First, there are unresolved technical issues about whether current value-added approaches can reliably identify the value-added estimate that should be attributed to individual teachers. Second, comparisons based on a pure value-added system are problematic in that the comparison selects teachers based on the outcome on which comparison will be done. Regression to the mean will lead the group selected to perform lower in the following year. Third, value-added approaches require the use of standardized tests that are unavailable for many grades and subjects in many states. Finally, in all proposed certification approaches that rely on value-added methods, the value-added estimates are combined with other measures. Although the use of additional measures is likely to increase the reliability and validity of the ultimate certification decision, it would also reduce the measured value-added difference between certified and noncertified teachers below the difference we will calculate for a pure value-added certification system. Therefore, although the pure value-added certification process that we discuss here is useful for making comparisons with other approaches to advanced certification, this kind of system could not be fully implemented in practice without resolving these caveats. A pure value-added certification process would have to determine where to draw the line between the teachers who are awarded board certification and those who are not. Table 7-5 gives results for three possible ways of doing this: choosing the top 25 percent, the top 50 percent, or the top 75 percent of all teachers. Using two different estimates of the size of quality differences across teachers, the table provides the size of the effect in student value-added (compared with an average teacher) that would be
OCR for page 178
Assessing Accomplished Teaching: Advanced-Level Certification Programs associated with teachers who pass and teachers who fail in the pure value-added certification process. Table 7-5 shows that in a pure value-added certification system, the size of the effects for teachers who pass and teachers who fail crucially depends on how many teachers are selected. If only the top 25 percent of teachers is selected, then the teacher who passes is much better than an average teacher—with an effect size of 0.32. In contrast, if the top 75 percent of teachers is selected, the effect size for teachers who pass is reduced to 0.11. To compare the pure value-added certification process portrayed in Table 7-5 to national board certification, we need to know whether national board certification is attempting to identify teachers who more closely correspond to the top 25 percent, the top 50 percent, or the top 75 percent of all teachers. Although we do not know the true quality of the teachers who apply for national board certification, we do know that roughly 60 to 65 percent of the teachers who apply are ultimately successful. We also know that teachers who fail the assessment are less effective than average teachers, and that the negative effect size for teachers who fail is about as large as the positive effect size for teachers who pass. Considering this information, the selection rule of choosing the top 50 percent for the pure value-added certification system seems like the closest match to the selectivity of national board certification. Our supplemental analyses of Florida and North Carolina data produced certification effect sizes for board-certified teachers that average roughly 0.04. This certification effect can be compared with the effect of 0.20 for a pure value-added certification system that chooses the top 50 percent of all teachers to receive certification. Comparison of the two effect sizes indicates that national board certification captures one-fifth of the value-added effect that would be produced by a pure value-added certification process. Taken together, the results from our additional analyses lead us to articulate three findings and a conclusion: TABLE 7-5 Certification Effects for a Pure Value-Added Certification Selection Rule Effect Size for Teachers Who Pass Effect Size for Teachers Who Fail Top 25% 0.32 −0.11 Top 50% 0.20 −0.20 Top 75% 0.11 −0.32 NOTE: Effect sizes estimated from the teacher quality distribution estimate of 0.25 standard deviation of student value added for a 1 standard deviation difference in teacher quality in Texas (see Hanushek, Kain, O’Brien, and Rivkin, 2005).
OCR for page 179
Assessing Accomplished Teaching: Advanced-Level Certification Programs Finding 7-1: Fourth and fifth graders taught by board-certified teachers in North Carolina show higher gains on the state’s accountability test than those taught by other teachers. The effects are in the range of 4 to 5 percent of a standard deviation in reading and 7 to 8 percent of a standard deviation in mathematics, and they are statistically significant. Finding 7-2: Fourth and fifth graders taught by board-certified teachers in Florida show slightly higher gains on the state’s accountability test in reading than those taught by other teachers. The effects are in the range of 2 to 4 percent of a standard deviation. In mathematics, the effects are indistinguishable from zero. Finding 7-3: In Los Angeles, similar achievement test gains in mathematics and reading were made by third through fifth graders taught by board-certified teachers and by teachers who had not applied for board certification. Achievement gains were statistically significantly lower for students taught by teachers who attempted to obtain board certification but were unsuccessful as compared with those taught by teachers who were board certified. Conclusion 7-1: Students taught by teachers who are board certified make slightly higher achievement test score gains than do those taught by teachers who have not applied for board certification. The magnitude of the effects varies for reading and math and by state or jurisdiction. Students taught by teachers who have attempted board certification but were unsuccessful make smaller gains than those taught by board-certified teachers or teachers who have not applied for board certification. The evidence is clear that national board certification distinguishes more effective teachers from less effective teachers with respect to student achievement. The differences are small (and not entirely consistent) in absolute terms, but when considered in terms of teacher value-added contributions to achievement, they are substantively meaningful. STUDIES OF OTHER STUDENT OUTCOMES Our search of the literature identified one study that measured the effects of board certification on students using outcomes other than achievement test scores. Helding and Fraser (2005) compared board-certified and nonboard-certified teachers at 13 high schools in Miami–Dade County in terms of classroom environment and student attitudes as well as achievement. The researchers used questionnaires to measure students’ perceptions about their science classes and their attitudes about science. The
OCR for page 180
Assessing Accomplished Teaching: Advanced-Level Certification Programs results from these questionnaires, along with each student’s science score on Florida’s state accountability test, were compared for students taught by board-certified and nonboard-certified teachers. Results generally favored the board-certified teachers, with their students having more positive attitudes and higher science scores. There are important limitations to these findings, however. The way in which teachers were recruited to participate in the study may have introduced biases, and no efforts were made to control for differences between students assigned to board-certified teachers and nonboard-certified teachers. The authors did not adjust for classroom clustering; thus the significance tests they performed overstate the differences between board-certified and nonboard-certified teachers. This study is described in more detail in Appendix A. CONCLUSIONS AND RECOMMENDATIONS Our search of the literature base revealed 11 studies on the relationship between board certification and outcomes for students, far more than we found for any other question on our evaluation framework. For the most part, however, the studies were based on data from only three states—most studied teachers and students in North Carolina, three drew their samples from Florida, and one used data from California (Los Angeles). The only exceptions were two relatively small-scale studies conducted in Arizona and Tennessee that had serious methodological limitations. Furthermore, nearly all focused on achievement test results in mathematics and reading, and most restricted their samples to students and teachers in the elementary grades. We are hesitant to generalize these findings to students and teachers in other states, subjects, and grades. The committee noted two paths that future research might take. One path would be replication of the Florida and North Carolina studies in more states, content areas, and grades. The committee recognizes, however, that when moving beyond the elementary grades, each student is taught by many teachers, which complicates the teacher attribution that is needed for this kind of research. Furthermore, many states may not have the extensive administrative data sets of teachers and students that are maintained in Florida and North Carolina. The second path is to examine other student outcomes. Test scores are a narrow conception of student learning, and the standardized test data currently available are primarily scores on tests designed to measure mastery of state content standards, not teaching skills. It may be that the skills board-certified teachers have to demonstrate have impacts on other outcomes that are not detected on accountability tests. We encourage different approaches to the research that focus on different outcomes.
OCR for page 181
Assessing Accomplished Teaching: Advanced-Level Certification Programs Regardless of the path, longitudinal analyses of test scores for large samples of students need to be balanced with smaller scale studies that use different methods or outcomes. Cantrell et al. (2007) provides one example of a smaller scale study that used random assignment and allowed the researchers to draw more valid conclusions. Helding and Fraser (2005) demonstrate an example of a study that expanded the kinds of outcomes to student attitudes and motivation. In addition, some of the validation studies discussed in Chapter 5 were based on classroom observations and reviews of student work. In these studies, the researchers evaluated the complexity of teachers’ assignments, the quality of student work samples, and the depth of students’ questions during classroom discussions. These are examples of other measures that might be considered. We therefore make the following recommendations: Recommendation 7-1: To the extent that existing data sets allow, we encourage replication of studies that investigate the effects of board-certified teachers on student achievement in states besides North Carolina and Florida, in content areas beyond mathematics and reading, and in grades beyond the elementary levels. Researchers pursuing such studies should work with the national board to obtain the information needed to study the effects of teachers who successfully obtained board certification as well as those who were unsuccessful. Recommendation 7-2: We encourage studies of the effects of board-certified teachers on outcomes beyond scores on standardized tests, such as student motivation, breadth of achievement, attendance rates, and promotion rates. The choice of outcome measures should reflect the skills that board-certified teachers are expected to demonstrate. Such research should be conducted using sound methodologies, adequate samples, carefully controlled conditions, and appropriate statistical analyses.