2
Uses and Consequences of Value-Added Models

This chapter provides an overview of how value-added models are currently being used for research, school and teacher improvement, program evaluation, and school and teacher accountability. These purposes can overlap to some extent, and often an evaluation system will be used for more than one purpose. The use of these models for educational purposes is growing fast. For example, the Teacher Incentive Fund program of the U.S. Department of Education, created in 2006, has distributed funds to over 30 jurisdictions to experiment with alternate compensation systems for teachers and principals—particularly systems that reward educators (at least in part) for increases in student achievement as measured by state tests.1 Some districts, such as the Dallas Independent School District (Texas), Guilford County Schools (North Carolina), and Memphis City Schools (Tennessee) are using value-added models to evaluate teacher performance (Center for Educator Compensation Reform, no date; Isenberg, 2008).

If the use of value-added modeling becomes widespread, what are the likely consequences? These models, particularly when used in a high-stakes accountability setting, may create strong incentives for teachers and administrators to change their behavior. The avowed intention is for educators to respond by working harder or by incorporating different teaching strategies to improve student achievement. However, perverse incentives may also be

1

The amount of the bonus linked to student achievement is small; much of the money goes to professional development. Additional funds for the Teacher Incentive Fund are supposed to come from the American Recovery and Reinvestment Act.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 15
2 Uses and Consequences of Value-Added Models This chapter provides an overview of how value-added models are cur- rently being used for research, school and teacher improvement, program evaluation, and school and teacher accountability. These purposes can overlap to some extent, and often an evaluation system will be used for more than one purpose. The use of these models for educational purposes is growing fast. For example, the Teacher Incentive Fund program of the U.S. Department of Education, created in 2006, has distributed funds to over 30 jurisdictions to experiment with alternate compensation systems for teach- ers and principals—particularly systems that reward educators (at least in part) for increases in student achievement as measured by state tests.1 Some districts, such as the Dallas Independent School District (Texas), Guilford County Schools (North Carolina), and Memphis City Schools (Tennessee) are using value-added models to evaluate teacher performance (Center for Educator Compensation Reform, no date; Isenberg, 2008). If the use of value-added modeling becomes widespread, what are the likely consequences? These models, particularly when used in a high-stakes accountability setting, may create strong incentives for teachers and admin- istrators to change their behavior. The avowed intention is for educators to respond by working harder or by incorporating different teaching strategies to improve student achievement. However, perverse incentives may also be 1 The amount of the bonus linked to student achievement is small; much of the money goes to professional development. Additional funds for the Teacher Incentive Fund are supposed to come from the American Recovery and Reinvestment Act. 

OCR for page 15
 GETTING VALUE OUT OF VALUE-ADDED created, resulting in unintended negative consequences. On one hand, for example, since a value-added system compares the performance of teachers relative to one another, it could reduce teacher cooperation within schools, depending on how the incentives are structured. On the other hand, if school-level value-added is rewarded, it can create a “free rider” problem whereby some shirkers benefit from the good work of their colleagues, without putting forth more effort themselves. Because the implementation of value-added models in education has so far been limited, there is not much evidence about their consequences. At the workshop, some clues as to how educators might respond were provided by the case of a program instituted in New York that used an adjusted status model to monitor the effectiveness of heart surgeons in the state’s hospitals. We provide below some examples of how value-added models have recently been used in education for various purposes. SOME RECENT uSES Research Value-added models can be useful for conducting exploratory research on educational interventions because they aim to identify the contribu - tions of certain programs, teachers, or schools when a true experimental design is not feasible. Workshop presenter John Easton has been studying school reform in Chicago for about 20 years. He and his colleagues used surveys of educa - tors to identify essential supports for school success (inclusive leadership, parents’ community ties, professional capacity, student-centered learning climate, and ambitious instruction). The team then used a value-added analysis to provide empirical evidence that these fundamentals were indeed strongly associated with school effectiveness. As a result of this research, the Chicago Public School system has adopted these essential supports as its “five fundamentals for school success” (Easton, 2008). Value-added models have also been used by researchers to gauge the relationship of various teacher qualifications (such as licensure, cer- tification, years of experience, advanced degrees) to student progress. Workshop discussant Helen Ladd described her research, which applied a value-added model to data from North Carolina to explore the relation- ship between teacher credentials and students’ performance on end-of- course exams at the high school level (Clotfelter, Ladd, and Vigdor, 2007). The researchers found that teacher credentials are positively correlated with student achievement. One problem Ladd’s studies identified is that teachers with weaker credentials were concentrated in higher poverty schools, and the apparent effects of having low-credentialed teachers in

OCR for page 15
 USES AND CONSEQUENCES OF VALUE-ADDED MODELS high school was great, particularly for African American students: “We conclude that if the teachers assigned to black students had the same cre - dentials on average as those assigned to white students, the achievement difference between black and white students would be reduced by about one third” (Clotfelter, Ladd, and Vigdor, 2007, p. 38). Easton argued that more research studies are needed using value- added models, as an essential first step in exploring their possible uses for accountability or other high-stakes purposes. “The more widely circulated research using value-added metrics as outcomes there is, the more under- standing there will be about [how] they can be used most successfully and what their limits are” (Easton, 2008, p. 9). School or Teacher Improvement Value-added models are intended to help identify schools or teachers as more effective or less effective, as well as the areas in which they are differentially effective. Ideally, that can lead to further investigation and, ultimately, the adoption of improved instructional strategies. Value-added results might be used by teachers for self-improvement or target setting. At the school level, they might be used along with other measures to help identify the subjects, grades, and groups of students for which the school is adding most value and where improvement is needed. Value- added analyses of the relationships between school inputs and school performance could suggest which strategies are most productive, leading to ongoing policy adjustments and reallocation of resources. The models might also be used to create projections of school performance that can assist in planning, resource allocation, and decision making. In these ways, value-added results could be used by teachers and schools as an early warning signal. Perhaps the best-known value-added model used for teacher evalua- tion and improvement is the Education Value Added Assessment System (EVAAS), which has been used in Tennessee since 1993. “The primary purpose . . . is to provide information about how effective a school, sys - tem, or teacher has been in leading students to achieve normal academic gain over a three year period.” (Sanders and Horn, 1998, p. 250). The sys- tem was created by William Sanders and his colleagues, and this model (or variations) have been tried in a number of different school districts. EVAAS-derived reports on teacher effectiveness are made available to teachers and administrators but are not made public. State legislation requires that EVAAS results are to be part of the evaluation of those teach- ers for whom such data are available (those who teach courses tested by the statewide assessment program). How large a role the estimates of effectiveness are to play in teacher evaluation is left up to the district,

OCR for page 15
 GETTING VALUE OUT OF VALUE-ADDED although EVAAS reports cannot be the sole source of information in a teacher’s evaluation. They are used to create individualized professional development plans for teachers, and subsequent EVAAS reports can be used to judge the extent to which improved teacher performance has resulted from these plans (Sanders and Horn, 1998). Program Evaluation When used for program evaluation, value-added models can provide information about which types of local or national school programs or policy initiatives are adding the most value and which are not, in terms of student achievement. These might include initiatives as diverse as a new curriculum, decreased class size, and approaches to teacher certification. The Teach For America (TFA) Program recruits graduates of four-year colleges and universities to teach in public schools (K-12) in high-poverty districts. It receives funding from both private sources and the federal government. In recent years, the program has placed between 2,000 and 4,000 teachers annually. Recruits agree to teach for two years at pay com - parable to that of other newly hired teachers. After an intensive summer- long training session, they are placed in the classroom, with mentoring and evaluation provided throughout the year. The program has been criticized because many believe that this alternate route to teaching is associated with lower quality teaching. There is also the concern that, because the majority of participants leave their positions upon complet- ing their two-year commitment, students in participating districts are being taught by less experienced (and therefore less effective) teachers. Xu, Hannaway, and Taylor (2007) used an adjusted status model (similar to a value-added model but does not use prior test scores) to investigate these criticisms. Using data on secondary school students and teachers from North Carolina,2 the researchers found that TFA teachers were more effective in raising exam scores than other teachers, even those with more experience: “TFA teachers are more effective than the teachers who would otherwise be in the classroom in their stead” (p. 23). This finding may be dependent on the poor quality of the experienced teachers in the types of high-poverty urban districts served by the program. 2 It is important to note that the researchers used a “cross-subject fixed-effects model” that employed student performance across a variety of subjects rather than student performance on tests taken in past years. This strategy was required because it was a study of secondary school performance, and prior scores in courses such as biology were not available.

OCR for page 15
 USES AND CONSEQUENCES OF VALUE-ADDED MODELS School or Teacher Accountability In an accountability context, consequences are attached to value- added results in order to provide incentives to teachers and school admin- istrators to improve student performance. They might be used for such decisions as whether the students in a school are making appropriate progress for the school to avoid sanctions or receive rewards, or whether a teacher should get a salary increase. School accountability systems that use value-added models would provide this information to the public—taxpayers might be informed as to whether tax money is being used efficiently, and users might be able to choose schools on a more informed basis. At this time, many policy makers are seriously consider- ing using value-added results for accountability, and there is much dis- cussion about these possible uses. But the design of a model might differ depending on whether the goal is to create incentives to improve the performance of certain students, to weed out weak teachers, or to inform parents about the most effective schools for their children. In August 2008, Ohio began implementing a program that incorpo- rates a value-added model. The program chosen by the state is based on the EVAAS model William Sanders developed for Tennessee. Ohio’s accountability system employs multiple measures, whereby schools are assigned ratings on the basis of a set of indicators. Until recently, the measures were (1) the percentage of students reaching the proficient level on state tests, as well as graduation and attendance rates; (2) whether the school made adequate yearly progress under No Child Left Behind; (3) a performance index that combines state tests results; and (4) a measure of improvement in the performance index. Ohio replaced the last component with a value-added indicator. Instead of simply comparing a student’s gain with the average gain, the model develops a customized prediction of each student’s progress on the basis of his or her own academic record, as well as that of other students over multiple years, with statewide test performance serving as an anchor. So the value-added gain is the differ- ence between a student’s score in a given subject and the score predicted by the model. The school-level indicator is based on the averages of the value-added gains of its students. Consequently, Ohio will now be rating schools using estimated value-added as one component among others. The model will be used only at the school level, not the teacher level, and only at the elementary and middle grades. Because tests are given only once in high school, in tenth grade, growth in student test scores cannot be determined directly (Public Impact, 2008). There are examples of using value-added modeling to determine teacher performance pay at the district level. The national Teacher Advancement Program (TAP) is a merit pay program for teachers that uses a value-added model of student test score growth as a factor in deter-

OCR for page 15
0 GETTING VALUE OUT OF VALUE-ADDED mining teacher pay. About 6,000 teachers in 50 school districts nation- wide participate in this program, which was established by the Milken Family Foundation in 1999. Participating districts essentially create an alternate pay and training system for teachers, based on multiple career paths, ongoing professional development, accountability for student per- formance, and performance pay. TAP uses a value-added model to deter- mine contributions to student achievement gains at both the classroom and school levels. Teachers are awarded bonuses based on their scores in a weighted performance evaluation that measures mastery of effec- tive classroom practices (50 percent), student achievement gains for their classrooms (30 percent), and school-wide achievement gains (20 percent) (http://www.talentedteachers.org/index.taf). It should be noted that a number of other states have had perfor- mance pay programs for teachers, including Alaska, Arizona, Florida, 3 and Minnesota, where growth in test scores is a factor, usually a rather small one, in determining teacher pay. However, these systems are based on growth models, not value-added models. Unlike value-added models, the growth models used do not control for background factors, other than students’ achievement in the previous year. Low Stakes Versus High Stakes A frequent theme throughout the workshop was that when test-based indicators are used to make important decisions, especially ones that affect individual teachers, administrators, or students, the results must be held to higher standards of reliability and validity than when the stakes are lower. However, drawing the line between high and low stakes is not always straightforward. As Henry Braun noted, what is “high stakes for somebody may be low stakes for someone else.” For example, simply reporting school test results through the media or sharing teacher-level results among staff—even in the absence of more concrete rewards or sanctions—can be experienced as high stakes for some schools or teach- ers. Furthermore, in a particular evaluation, stakes are often different for various stakeholders, such as students, teachers, and principals. Participants generally referred to exploratory research as a low-stakes use and school or teacher accountability as a high-stakes uses. Using value-added results for school or teacher improvement, or program evalu- ation, fell somewhere in between, depending on the particular circum- 3 Interestingly, the Florida merit pay program proved very unpopular after it was dis - covered that teachers in the most affluent schools were the ones benefiting the most. Most of the participating districts turned down the additional money after its first year of implementation.

OCR for page 15
 USES AND CONSEQUENCES OF VALUE-ADDED MODELS stances. For example, as Derek Briggs pointed out, using a value-added model for program evaluation could be high stakes if the studies were part of the What Works Clearinghouse, sponsored by the U.S. Department of Education. In any case, it is important for designers of an evaluation system to first set out the standards for the properties they desire of the evalua- tion model and then ask if value-added approaches satisfy them. For example, if one wants transparency to enable personnel actions to be fully defensible, a very complex value-added model may well fail to meet the requirement. If one wants all schools in a state to be assessed using the same tests and with adjustments for background factors, value-added approaches do meet the requirement. POSSIbLE INCENTIVES AND CONSEQuENCES To date, there is little relevant research in education on the incentives created by value-added evaluation systems and the effects on school culture, teacher practice, and student outcomes. The workshop therefore addressed the issue of the possible consequences of using value-added models for high-stakes purposes by looking at high-quality studies about their use in other contexts. Ashish Jha presented a paper on the use of an adjusted status model (see footnote 4, Chapter 1) in New York State for the purpose of improving health care. The Cardiac Surgery Reporting System (CSRS) was introduced in 1990 to monitor the performance of surgeons performing coronary bypass surgeries. The New York Department of Health began to publicly report the performance of both hospitals and individual surgeons. Assessment of the performance of about 31 hospi- tals and 100 surgeons, as measured by risk-adjusted mortality rates, was freely available to New York citizens. In this application, the statistical model adjusted for patient risk, in a manner similar to the way models in education adjust for student characteristics. The model tried to address the question: How successful was the treatment by a certain doctor or hospital, given the severity of a patient’s symptoms? The risk-adjustment model drew on the patients’ clinical data (adequacy of heart function prior to surgery, condition of the kidneys, other factors associated with recovery, etc.). In 1989, prior to the introduction of CSRS, the risk-adjusted in-hospital mortality rate for patients undergoing heart surgery was 4.2 percent; eight years after the introduction of CSRS, this rate was cut in half to 2.1 percent, the lowest in the nation. Empirical evaluations of CSRS, as well as anecdotal evidence, indicate that a number of surgeons with high adjusted mortality rates stopped practicing in New York after public reporting began. Poor-performing surgeons were four times more likely

OCR for page 15
 GETTING VALUE OUT OF VALUE-ADDED to stop practicing in New York within two years of the release of a nega - tive report. (However, many simply moved to neighboring states.) Several of the hospitals with the worst mortality rates revamped their cardiac surgery programs. This was precisely what was hoped for by the state and, from this point of view, the CSRS program was a success. However, there were reports of unintended consequences of this inter- vention. Some studies indicated that surgeons were less likely to operate on sicker patients, although others contradicted this claim. There was also some evidence that documentation of patients’ previous conditions changed in such a way as to make them appear sicker, thereby reducing a provider’s risk-adjusted mortality rate. Finally, one study conducted by Jha and colleagues (2008) found that the introduction of CSRS had a significant deleterious effect on access to surgery for African American patients. The proportion of African American patients dropped, presum- ably because surgeons perceived them as high risk and therefore were less willing to perform surgery on them. It took almost a decade before the racial composition of patients reverted to pre-CSRS proportions. This health care example illustrates that, if value-added models are to be used in an education accountability context, with the intention of changing the behavior of teachers and administrators, one can expect both intended and unintended consequences. The adjustment process should be clearly explained, and an incentive structure should be put into place that minimizes perverse incentives. Discussant Helen Ladd emphasized transparency: “Teachers need to understand what goes into the outcome measures, what they can do to change the outcome, and to have confidence that the measure is consistently and fairly calculated. . . . The system is likely to be most effective if teachers believe the measure treats them fairly in the sense of holding them accountable for things that are under their control.” Workshop participants noted a few ways that test-based account- ability systems have had unintended consequences in the education context. For example, Ladd (2008) gave the example of South Carolina, which experimented in the 1980s with a growth model (not a value-added model). It was hoped that the growth model would be more appropri- ate and useful than the status model that had been used previously. The status model was regarded as faulty because the results largely reflected socioeconomic status (SES). It was found, however, that the growth model results still favored schools serving more advantaged students, which were then more likely to be eligible for rewards than schools serving low-income students and minority students. State and school officials were concerned. In response, they created a school classification system based mainly on the average SES of the students in the schools. Schools were then compared only with other schools in the same category, with rewards equitably dis-

OCR for page 15
 USES AND CONSEQUENCES OF VALUE-ADDED MODELS tributed across categories. This was widely regarded as fair. However, one result was that schools at the boundaries had an incentive to try to get into a lower SES classification in order to increase their chances of receiving a reward. Sean Reardon pointed out a similar situation based on the use of a value-added model in San Diego (Koedel and Betts, 2009). Test scores from fourth grade students (along with their matched test scores from third and second grade) indicated that teachers were showing the great- est gains among low-performing students. Possible explanations were that the best teachers were concentrated in the classes with students with the lowest initial skills (which was unlikely), or that there was a ceiling effect or some other consequence of test scaling, such that low-performing students were able to show much greater gains than higher-performing students. It was difficult to determine the exact cause, but had the model been implemented for teacher pay or accountability purposes, the teachers would have had an incentive to move to those schools serving students with low SES, where they could achieve the greatest score gains. Reardon observed, “That could be a good thing. If I think I am a really good teacher with this population of students, then the league [tables] make me want to move to a school where I teach that population of students, so that I rank relatively high in that league.” The disadvantage of using indicators based on students’ status is that one can no longer reasonably compare the effectiveness of a teacher who teaches low-skilled students with that of a teacher who teaches high-skilled students or compare schools with very different populations. Adam Gamoran suggested that the jury has not reached a verdict on whether a performance-based incentive system that was intended to moti- vate teachers to improve would be better than the current system, which rewards teachers on the basis of experience and professional qualifications. However, he noted that the current system also has problematic incen- tives: it provides incentives for all teachers, regardless of their effective- ness, to stay in teaching, because the longer they stay, the more their salary increases. After several years of teaching, teachers reach the point at which there are huge benefits for persisting and substantial costs to leaving. An alternative is a system that rewards more effective teachers and encourages less effective ones to leave. A value-added model that evaluates teachers has the potential to become part of such a system. At the moment, such a system is problematic, in part because of the imprecision of value- added teacher estimates. Gamoran speculated that a pay-for-performance system for teachers based on current value-added models would probably result in short-term improvement for staying, because teachers would work harder for a bonus. He judged that the long-term effects are less clear, however, due to the imprecision of the models under some conditions.

OCR for page 15
 GETTING VALUE OUT OF VALUE-ADDED Given this imprecision, a teacher’s bonus might be largely a matter of luck rather than a matter of doing something better. “Teachers will figure that out pretty quickly. The system will lose its incentive power. Why bother to try hard? Why bother to seek out new strategies? Just trust to luck to get the bonus one year if not another.” These potential problems might be reduced by combining a teacher’s results across several (e.g., three) years, thus improving the precision of teachers’ value-added estimates. Several workshop participants made the point that, even without strong, tangible rewards or sanctions for teachers or administrators, an accountability system will still induce incentives. Ben Jensen commented that when value-added scores are made publicly available, they create both career and prestige incentives: “If I am a school principal, particularly at a school serving a poor community, [and] I have a high value-added score, I am going to put that on my CV and therefore, there is a real incentive effect.” Brian Stecher also noted that for school principals in Dallas, which has a performance pay system, it is not always necessary to give a princi- pal a monetary reward to change his or her behavior. There is the effect of competition: if a principal saw other principals receiving rewards and he or she did not get one, that tended to be enough to change behavior. The incentives created a dramatic shift in internal norms and cultures in the workplace and achieved the desired result. NOT FOR ALL POLICy PuRPOSES Value-added models are not necessarily the best choice for all policy purposes; indeed, no single evaluation model is. For example, there is concern that adjusting for students’ family characteristics and school contextual variables might reinforce existing disadvantages in schools with a high proportion of students with lower SES, by effectively setting lower expectations for those students. Another issue is that value-added results are usually normative: Schools or teachers are characterized as performing either above or below average compared with other units in the analysis, such as teachers in the same school, district, or perhaps state. In other words, estimates of value-added have meaning only in comparison to average estimated effectiveness. This is different from cur- rent systems of state accountability that are criterion-referenced, in which performance is described in relation to a standard set by the state (such as the proficient level). Dan McCaffrey explained that if the policy goal is for all students to reach a certain acceptable level of achievement, then it may not be appropriate to reward schools that are adding great value but

OCR for page 15
 USES AND CONSEQUENCES OF VALUE-ADDED MODELS still are not making enough progress.4 From the perspective of students and their families, school value-added measures might be important, but they may also want to know the extent to which schools and students have met state standards. CONCLuSION Value-added models clearly have many potential uses in education. At the workshop, there was little concern about using them for explor- atory research or to identify teachers who might benefit most from pro- fessional development. In fact, one participant argued that these types of low-stakes uses were needed to increase understanding about the strengths and limitations of different value-added approaches and to set the stage for their possible use for higher stakes purposes in the future. There was a great deal of concern expressed, however, about using these models alone for high-stakes decisions—such as whether a school is in need of improvement or whether a teacher deserves a bonus, tenure, or promotion—given the current state of knowledge about the accuracy of value-added estimates. Most participants acknowledged that they would be uncomfortable basing almost any high-stakes decision on a single measure or indicator, such as a status determination. The rationales for participants’ concerns are explained in the next two chapters. 4 Of course, there can be disagreement as to whether this is a reasonable or appropriate goal.

OCR for page 15