Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 63
5 Student Achievement Under PERAA: First Impressions To ask how well the schools are doing under the Public Education Reform Amendment Act (PERAA) is to ask countless specific—and often complicated—questions, which is why a thorough, 5-year evaluation is called for in the law. That evaluation may be even more important than originally envisioned because—even in the short time that this committee has been developing the evaluation plan—there has been complete turnover in the primary leadership positions for education in the District. There is a new mayor, deputy mayor, chancellor, and interim state superintendent of schools. Yet, 3 years after PERAA was enacted and after many significant changes have been implemented, it is not unreasonable to consider what has happened, what we call first impressions. In this and the next chapter, we present our first impressions on several goals of the legislation: this chapter considers student achievement data. Chapter 6 looks at the other aspects of the system that also must be mea- sured: the quality of district staff, the quality of classroom teaching and learning, service to vulnerable children and youth, family and community engagement, and operations. For the purposes of this first phase of the evaluation effort, the com- mittee was able to collect only preliminary information about student achievement and the five primary areas of district responsibility we discuss in Chapter 6. We stress that these first impressions are useful only as a basis for further inquiry and not as reliable evidence about the effectiveness of the changes under PERAA or how best to fine-tune programs and strategies in the future. Those tasks require an ongoing program of evaluation and research, which we offer in Chapter 7. 63
OCR for page 64
64 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS STUDENT ACHIEVEMENT AND TEST DATA The most readily available first impressions of student achievement are provided by test scores. There is a long history of relying on student test data as a measure of the effectiveness of public education, and it is tempting to simply rely on those readily available data for judgments about student achievement and about causes and effects. However, student test scores alone provide useful but limited information about the causes of improve- ments or variability in student performance. The results of achievement tests provide only estimates about students’ skills and knowledge in selected areas—usually, what they know and can do in mathematics and reading and sometimes other subjects. Aggregate year-to-year comparisons of test scores in the District’s schools are con- founded by changes in student populations that result from student moves in and out of the city and between DC Public Schools (DCPS) and charter schools, dropout and reentry, and also from variations in testing practices that may exclude or include particular groups of students.1 For these and other reasons, therefore, it is important to remember that the consensus of measurement and testing experts has long been to use test scores cautiously. For this discussion, it is perhaps most important to underscore that most tests are not designed to support inferences about related questions, such as how well students were taught, what effects their teachers had on their learning, why students in some schools or classrooms succeed while those in similar schools and classrooms do not, whether conditions in the schools have improved as a result of a policy change, or what policy makers should do to solidify gains or reverse declines. Answering those sorts of questions requires other kinds of evidence besides test scores. Looking at test scores should be only a first step—not an end point—in considering questions about student achievement, or even more broadly, about student learning. Nevertheless, changes in student test scores since 2007 provide one set of impressions regarding progress in DC schools. We offer here an overview of publicly available data from both the District of Columbia Comprehen- sive Assessment System (DC CAS) and the U.S. Department of Education’s National Assessment of Educational Progress (NAEP). We first discuss these data sources, then look at the trend data, and end the chapter with a discus- sion of how to interpret the data. But we note again that a systematic and comprehensive analysis of achievement data for DC was beyond the scope of this report; the readily available information provides only a useful first 1 Test scores also come with measurement issues that have to be considered if they are to provide an accurate picture of even those areas they do measure (Koretz, 2008; National Research Council, 1999; Office of Technology Assessment, 1992).
OCR for page 65
65 STUDENT ACHIEVEMENT UNDER PERAA look and hints about issues related to student achievement that will need to be addressed in the long-term evaluation. THE DATA SOURCES The District’s assessment system, the DC CAS, assesses students in grades 3 through 8 in reading and mathematics and in selected grades in science and composition.2 The assessment system, which has been in place since 2006, is designed to measure individual students’ progress toward meeting the District of Columbia’s standards3 and is used to meet federal requirements under the Elementary and Secondary Education Act, as amended by the No Child Left Behind (NCLB) Act (District of Columbia Public Schools, 2010a).4 DC CAS scores are used to determine if a given school is making sufficient progress under NCLB, and the media and the public look to them for an indication of how well district schools are doing. DC CAS results are reported using four performance levels: advanced, proficient, basic, and below basic. Box 5-1 provides an example of the performance descriptions used in DC CAS. The NAEP, known popularly as the Nation’s Report Card, is an assess- ment administered by the U.S. Department of Education and overseen by the autonomous National Assessment Governing Board that provides independent data about what students know and can do in mathematics, reading, and other subjects. NAEP is valuable in part because it is not a high-stakes test—scores for individual students or schools are not reported, and there are no consequences to students, teachers, or schools associated with NAEP scores. Results are reported for states, selected urban districts, and the nation, and all students are measured against common performance expectations; consequently, the results can be used to make comparisons among jurisdictions.5 Changes to the assessment are infrequent and come with careful studies of comparability, so NAEP is also used to track student 2 Information about DC-CAS can be found at: http://www.dc.gov/DCPS/In+the+Classroom/ How+Students+Are+Assessed/Assessments/DC+Comprehensive+Assessment+System-Alternate +Assessment+Portfolio+(DC+CAS-Alt) [accessed October 2010]. 3 Information about the academic standards can be found at: http://osse.dc.gov/seo/cwp/ view,A,1274,Q,561249,seoNav,%7C31193%7C.asp. According to the 2011 DC CAS guide (Office of the State Superintendent of Education, 2010b), the assessments in reading and mathematics are aligned to both the District of Columbia standards and to the Common Core Standards, a set of standards that the majority of states have recently adopted to ensure greater consistency in public education from state to state (http://www.corestandards.org/ [accessed December 2010]). 4 States must meet the requirements of the No Child Left Behind Act in order to receive federal financial assistance to support the education of poor children. 5 The comparisons are subject to some caveats related to such issues as inclusion rates for students with disabilities and English language learners.
OCR for page 66
66 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS BOX 5-1 DC CAS Performance Descriptions for 3rd Grade Reading The DC CAS is a standards-based assessment. Based on performance, each student is classified as performing at one of four performance levels: below basic, basic, proficient, and advanced. The descriptions below are examples of perfor- mance descriptions for each level. Below Basic Students are able to use vocabulary skills, such as identifying literal or com- mon meanings of words and phrases, sometimes using context clues. Students are able to read some 3rd grade informational and literary texts and can identify a main idea, make some meaning of text features and graphics, form questions, locate text details, and identify simple relationships (e.g., cause/effect) in texts. Basic Students are able to use vocabulary skills, such as identifying words with prefixes and suffixes, and distinguishing between literal and nonliteral meanings of some common words and phrases. Students are able to read some 3rd grade informational and literary texts and can identify main points and some supporting facts, locate stated facts and specific information in graphics, form questions, iden- tify lessons in a text, make simple connections within and between texts, describe and compare characters, and make simple interpretations. Proficient Students are able to use vocabulary skills, such as identifying affixes and root words and using context clues to interpret nonliteral words and meanings of nknown words. Students are able to read 3rd grade informational and liter- u ary texts and can distinguish between stated and implied facts and cause/effect relationships, determine and synthesize steps in a process, connect procedures to real-life situations, explain key ideas in stories, explain relationships among char- acters, identify subtle personality traits of characters, and connect story details to prior knowledge. Advanced Students are able to use vocabulary skills, such as identifying the figurative meanings or nonliteral meanings of some words and phrases in a moderately complex text. Students are able to read 3rd grade informational and literary texts and summarize the information or story with supporting details, apply text infor- mation to graphics, identify and explain relationships of facts and cause/ ffect e relationships, use text features to make predictions, distinguish between fact and fiction, identify a speaker in a poem or narrator in a story, explain key ideas with supporting details, use context to interpret simple figurative language, and deter- mine simple patterns in poetry. SOURCE: Office of the State Superintendent of Education (2010a, p. 1).
OCR for page 67
67 STUDENT ACHIEVEMENT UNDER PERAA performance over time. For example, the NAEP mathematics assessment scores go back to 1992, and those for reading to 1990. DC has participated in NAEP as a “state” since the early 1990s, and the scores for grades 4 and 8 reflect the performance of all District public schools, including all public charter schools. When NAEP began the Trial Urban District Assessment (TUDA) in 2002, the District was included in this assessment as well, one of only five such districts. In 2009, 18 districts participated. Until 2009, the scores for the District were the same for both the state and district assessments. Beginning in 2009 most charter schools were excluded from the District’s TUDA results, but they remained in the state score calculation. The charter schools that are excluded from DCPS’s adequate yearly progress (AYP) report under NCLB were also excluded from the NAEP TUDA. This look at the data presents both state and TUDA scores together. The state scores include all DC schools and therefore serve as the basis for comparison. All NAEP data in this section refer to the state DC scores unless otherwise noted. The TUDA scores are presented in graphics for completeness, but are not discussed and should be evaluated cautiously, particularly in the context of comparisons between 2007 and 2009 because of the change in charter school exclusion in 2009. These two assessments provide different ways of measuring student progress. DC CAS provides evidence of the progress of individual students and groups (such as 3rd graders in a school or in the district) toward mastering specific objectives in the DC standards. NAEP provides a picture of what students at each grade in the District as a whole know and can do in terms of nationwide definitions of achievement in each subject. TEST SCORE TRENDS The percentage of tested students who performed at or above the pro- ficient level (proficiency rate) in all grades in the District on the DC CAS increased from 2006 to 2010. Figure 5-1 shows the upward trend prior to PERAA’s passage in 2007. After 2007, the trend in both reading and mathematics increased more steeply for 2 years, then flattened out, and then declined slightly in the 2009-2010 school year. Figure 5-2 shows the percentages of students (by grade and subject) performing at each of the four proficiency levels on DC CAS and state NAEP for 2007 and 2009. These data show that, in general, the percent- ages of students in both the below basic and basic categories decreased for both assessments, while the percentages of students performing at both the proficient and advanced levels increased. That is, the distribution of students shifted to higher performance levels from 2007 to 2009. Thus, the NAEP scores appear to confirm the improvement shown on the DC CAS scores. However, the percentages of students performing at the proficient
OCR for page 68
68 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS 50 Percentage Above Proficient 40 Reading 30 Mathematics 20 10 0 2006 2007 2008 2009 2010 Year FIGURE 5-1 Percentage of District students at or above the proficient level on the DC CAS in reading and mathematics, 2006-2010. SOURCE: Adapted from http://www.nclb.osse.dc.gov/index.asp [accessed December Fig 5-1.eps 2010]. level or above on NAEP is significantly smaller than the percentage who perform at those levels on DC CAS—a finding that suggests that DC CAS is a less challenging assessment than NAEP (we discuss this issue further below). The average (scaled) score on NAEP shows a similar positive trend. The 2009 NAEP scores for DC as a state (see Figure 5-3) in grade 4 for both reading and mathematics were statistically significantly higher than they had been in all previous years (2003, 2005, and 2007). This was also true for grade 8 mathematics. In grade 8 reading, the DC state scores in 2009 were also significantly higher than those for 2003 and 2005, but not than those for 2007. That is, grade 8 reading was the only assessment in which the District did not show a significant gain from 2007 to 2009. In comparison with states, the District’s scores were notable. In grade 4 reading, only two other states improved since 2007; in math- ematics, only four other states showed significant improvement at both grades 4 and 8. Only three states, Kentucky, Rhode Island, and Vermont, showed improvement in three of the four assessments, and no state im- proved in all four. However, in comparison with other urban districts, the District’s scores were similar: many others also showed consistently significant gains.
OCR for page 69
69 STUDENT ACHIEVEMENT UNDER PERAA FIGURE 5-2 Proficiency levels of District students from DC CAS and NAEP for Fig 5-2.eps 2007 and 2009 in reading and mathematics. SOURCES: National Center for Education Statistics, NAEP Data Explorer, see 2 bitmaps http://nces.ed.gov/nationsreportcard/naepdata/ [accessed September 2010]; DC CAS, see http://www.nclb.osse.dc.gov/index.asp [accessed March 2011].
OCR for page 70
70 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS Grade 4 Mathematics 250 Charlotte Austin 240 New York Houston 230 Boston Average Scores Average Scores Atlanta San Diego Los Angeles 220 DC - TUDA DC - State Chicago Cleveland 210 200 190 180 2003 2005 2007 2009 Year Grade 4 Reading 250 240 230 Average Scores Average Scores Charlotte Austin 220 New York San Diego 210 Boston DC - TUDA Houston Atlanta DC - State 200 Chicago Cleveland Los Angeles 190 180 2003 2005 2007 2009 2002 Year FIGURE 5-3 NAEP TUDA average scores for 10 urban districts and DC, as well as DC state NAEP, for mathematics and reading, grades 4 and 8, 2002-2009. NOTES: State and TUDA scores are presented together for completeness. State scores include all DC schools and thus are the focus of this analysis. TUDA scores should be evaluated cautiously particularly when comparing 2007 to 2009 because most charter schools were excluded in 2009 (but included in 2007—see text). Fig 5-3.ep SOURCE: National Center for Education Statistics, NAEP Data Explorer, see http://nces.ed.gov/nationsreportcard/naepdata/ [accessed September 2010]. landscap (lower right--New York doesn’t seem
OCR for page 71
71 STUDENT ACHIEVEMENT UNDER PERAA Grade 8 Mathematics 300 290 Austin Charlotte 280 San Diego Houston Boston Average Scores 270 New York 260 Atlanta Chicago DC - State Cleveland 250 DC - TUDA Los Angeles 240 230 220 2003 2005 2007 2009 Year Grade 8 Reading 300 290 280 Average Scores 270 Charlotte New York 260 Austin San Diego Boston Chicago 250 Atlanta e Houston DC - State 240 DC - TUDA Cleveland Los Angeles 230 220 2002 2003 2005 2007 2009 Year FIGURE 5-3 Continued. Fig 5-3.eps landscape sn’t seem to have any plotted points)
OCR for page 72
72 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS Figure 5-3 also shows trends for other school districts assessed by NAEP in mathematics and reading.6 Two points are worth noting: the District’s average scores are low compared with those of most of the other 10 school districts in both the 2007 and 2009 TUDA (including Boston, Chicago, and New York), but DC and its peer districts are improving at similar rates. Most districts showed gains from 2007 to 2009: see Figure 5-4.7,8 It is important to note, however, that scores that are averaged across large numbers of students can obscure which students are improving and by how much. It may be that only a small group of students is making gains while others are not improving or may even be doing worse than previ- ously. For example, the highest achievers may be showing gains while the lowest achievers are not, or vice versa. The committee was limited by time and resources in the number of disaggregations we could carry out for this report, but a few examples demonstrate the importance of looking beyond average scores. It appears that students at every level in the District are gaining ground. As Figure 5-5 shows, for example, in DC state NAEP grade 4 reading, students in the lowest, middle, and highest groups all made gains, with the lowest scoring students gaining at a faster rate than the others. We note, too, that black, Hispanic and white 4th graders on average scored higher on the DC CAS mathematics in 2010 than in 2007, while English language learners and students with disabilities also showed some improvements relative to their peers: see Figure 5-6. The NAEP data show different results. For grade 4 reading, there was no significant change in the performance of white 4th grade students in the District from 2005 to 2009, while scores for both black and Hispanic 4th graders showed a significant gain for 2009: see Figure 5-7. For grade 8 reading, the NAEP data show large achievement gaps when scores are bro- ken out by the educational attainment of students’ mothers: see Figure 5-8. These data show improvements for those students whose mothers did not finish high school. 6 Although 18 urban districts participated in the 2009 mathematics and reading assessments, only 10 districts other than DC also participated in assessments in previous years, so it is only possible to examine changes over time for those 10. 7 These data findings come from the test of differences in gains performed through NAEP Data Explorer, see http://nces.ed.gov/nationsreportcard/naepdata/ [accessed September 2010]. 8 Although the DC state scores are the focus of this analysis because they reflect all public schools, the TUDA scores are also presented in Figure 5-3. It should be noted that if the non- DCPS charter schools had been excluded in 2007 as they were in 2009 (i.e., if NAEP had used comparable samples in both years), the District would have also shown a statistically signifi- cant increase from 244 in 2007 to 251 in 2009 in grade 8 mathematics, rather than the non- significant change from 248 to 251: see “comparability of samples” at http://nationsreportcard. gov/math_2009/about_math.asp [accessed December 2010].
OCR for page 73
Change in NAEP Scores 2007-2009 Grade 4 Reading Grade 8 Reading Grade 4 Mathema cs Grade 8 Mathema cs 8 6 6 6 5 5 5 5 5 4 4 4 4 4 4 4 3 333 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 11 1 0 0 -1 -1 -1 -1 -1 ATLANTA AUSTIN BOSTON CHARLOTTE CHICAGO CLEVELAND DC TUDA DC STATE HOUSTON LOS NEW YORK SAN DIEGO -2 ANGELES -4 -4 FIGURE 5-4 Changes in NAEP scores for selected urban districts, 2007-2009. Numbers indicate the amount of the increase or decrease in the average scaled score. NOTES: State and TUDA scores are presented together for completeness. State scores include all DC schools and thus are the focus of this analysis. TUDA scores should be evaluated cautiously particularly when comparing 2007 to 2009 because most charter schools were excluded in 2009 (but included in 2007—see text). SOURCE: National Center for Education Statistics, NAEP Data Explorer, see http://nces.ed.gov/nationsreportcard/naepdata/ redrawn new Figure 5-4 73 [accessed September 2010].
OCR for page 78
78 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS (see, e.g., Linn, 2000). One hypothesis as to the reason for this pattern is that as teachers and students gradually become accustomed to the new test format and new expectations, student performance improves, but that once the test is familiar, performance stays flat (Koretz et al., 1991). Additional evidence would be needed to show whether this phenomenon might explain the observed changes in DC. In short, the DC CAS scores did rise, but there is insufficient evidence to establish the reason for the improvement. In the case of the District, the fact that NAEP shows increases similar to those seen on the DC CAS suggests that the new-test phenomenon may not be the primary explanation; however, other changes that occurred in the same period could be responsible. Demographic shifts—changes in the composition of the student population that occur when students leave or enter the system (which will also change the groups of students being compared from year to year) are another potential source of change in test scores. Since the tests compared cohorts of students, scores will be affected if the populations are not similar. For example, if more higher-scoring 4th graders move into (or opt not to leave) the district’s public schools from one year to the next, average scores would likely rise—but that rise would not reflect improved learning. Such changes could occur because of in- or out- migration from the city or transfers between public and charter schools. If there are only small differences in the composition of students being tested across years, the effect would be slight. However, if, substantially more or fewer students in one year came from families of low socioeconomic status than in the next year, test results might show substantial changes that have nothing to do with the quality of instruction in schools or improved student learning. This is a serious issue in DC, which has a highly mobile stu- dent population, where many students move into and out of the charter school system, and which has a history in which the most disadvantaged residents have sometimes been forced by changing political and economic forces to move within the city or into neighboring jurisdictions. This issue is not just theoretical. The composition of students in tested grades in the District of Columbia’s public schools, has changed markedly since 2007 (see Table 5-1).11 The number of students in all tested grades in DCPS has dropped by almost 21 percent, while the number of tested stu- dents in the charters has increased.12 However this decrease within the DCPS has not been consistent across demographic groups; in contrast the subgroup composition of students attending public charter schools in the district has remained relatively stable over this same time period—see Table 5-1. 11 Table5-1 was revised after the prepublication report was released; data are now presented separately for DCPS and charter schools (previously the combined data were presented). 12 Discussion in this paragraph relies on data about students enrolled in the tested grades of 3-8 and 10 only and not to all students. See Table 5-1.
OCR for page 79
79 STUDENT ACHIEVEMENT UNDER PERAA For example, the enrollment of students who were not economically dis- advantaged fell considerably in DCPS between 2007 and 2010 while the enrollment of economically disadvantaged students also declined but not at the same rate. This means that economically disadvantaged students now make up a larger proportion of the total population of DCPS students— an increase of 8.2 percentage points; economically disadvantaged students were 62.2 percent of the DCPS tested population in 2007 and 70.4 percent in 2010. A similar pattern can be found for black students whose overall numbers fell in DCPS while those of whites and Hispanics increased slightly resulting in a shift in the overall demographic composition of the DCPS student body. The effects of families leaving the district or returning to the district are not generally factored in to summary proficiency statistics, yet these patterns could significantly bias the summary statistics (including co- hort averages) either up or down. As we discussed in Chapter 3, the District has witnessed changes in movement between DCPS and charter schools and in the composition of particular neighborhoods (as well as tensions regard- ing school closures and school improvements) that are likely to affect local school student populations; consequently, this issue should be carefully considered when interpreting changes in student achievement data. Dropout rates raise similar concerns. As students drop out of schools, their test scores are no longer included in their schools’ data. Thus, those schools’ average test scores may improve if significant numbers of low- achieving students leave, even if the remaining students’ scores have not gone up and the school has not actually improved. This is an important consideration in assessing DC’s test scores because a recent report from the National Center for Education Statistics found that the rate of students who enter 9th grade and later graduate from a DC school has steadily declined, from 68 percent in the 2001-2002 school year to only 56 percent in the 2007-2008 school year. The validity of data on dropout rates is, in itself, an issue of serious concern in interpreting achievement data (see, e.g., National Research Council and National Academy of Education, 2011). For all of these reasons, reports of test score gains are complete and valid only when they include analysis of the demography of the student population—including examinations of the distribution of students by geo- graphic area (e.g., ward) and movement into and out of charter schools, pri- vate schools, and suburban school districts. One means of factoring out the effects of population changes is to track individual students in the system over time to determine whether their performance is on an upward trajectory, that is, to follow actual cohorts of students across time. Doing so makes it pos- sible to see the performance of the students who remain in the system without any distortion that could come from changes in demographic composition. Thus, it is important to complement the average scaled scores and demo- graphic analyses with assessments of individual student growth over time.
OCR for page 80
80 TABLE 5-1 Changes in Demographic Subgroups Enrolled in Tested Grades for DCPS and Charter Schools, 2007-2010 DCPS Non Non- Non- Year Econ Dis Econ Dis Black Hispanic Whitea LEP LEP SPED SPED Total 2007 subgroup% 62.2 37.8 83.5 9.5 5.3 8.2 91.8 21.8 78.2 # 16,283 9,916 21,881 2,487 1,376 2,185 24,422b 5,707 20,492 26,199 2008 % 64.2 35.8 83.7 10.1 5.7 8.4 91.6 19.7 80.3 # 15,125 8,440b 19,463 2,353 1,331 1,993 21,872b 4,638 18,927b 23,259 2009 % 68.5 31.5 80.0 11.2 6.9 11.6 88.4 20.8 79.2 # 14,631 6,738 17,091 2,389 1,473 2,470 18,899 4,446 16,923 21,369 2010 % 70.4 29.6 78.1 12.1 7.8 8.0 92.0 21.0 79.0 # 14,587 6,140 16,181 2,518 1,610 1,663 19,064 4,352 16,375 20,727 Percentage Point Change in Subgroup 8.2 –8.2 –5.5 2.6 2.5 –0.2 0.2 –0.8 0.8 Composition (2010%-2007%) Percentage Change –10.4 –38.1 –26.0 1.2 17.0 –23.9 –21.9 –23.7 –20.1 –20.9 in Number Enrolled (2010#-2007#/2007#)
OCR for page 81
Public Charter Schools in DC Non Non- Non- Year Econ Dis Econ Dis Black Hispanic Whitea LEP LEP SPED SPED Total 2007 subgroup% 66.3 33.7 90.4 6.6 2.2 4.3 95.7 12.8 87.2 # 5,971 3,037 8,140 599 194 390 8,690b 1,155 7,853 9,008 2008 % 70.0 30.0 86.9 6.9 2.4 5.2 94.8 13.7 86.3 # 6,718 2,876 8,608 687 241 501 9,184 1,310 8,284 9,900 2009 % 72.0 28.0 89.6 7.6 2.2 6.0 94.0 13.4 86.6 # 8,174 3,183 10,173 859 251 677 10,680 1,519 9,838 11,357 2010 % 68.3 31.7 88.7 8.2 2.5 4.7 95.3 12.5 87.5 # 7,962 3,698 10,339 952 297 550 11,110 1,452 10,208 11,660 Percentage Point Change in Subgroup 2.0 –2.0 –1.7 1.5 0.4 0.4 –0.4 –0.4 0.4 Composition (2010%-2007%) Percentage Change 33.3 21.8 27.0 58.9 53.1 41.0 27.8 25.7 30.0 29.4 in Number Enrolled (2010#-2007#/2007#) NOTES: Tested grades are 3-8 and 10. Econ Dis = economically disadvantaged, LEP = limited English proficient, SPED = special education. aPercentages (across black, Hispanic, white) do not sum to 100 because two subgroups (with very low numbers) are not shown. bThe total across these two subgroups is greater than the total number of students reported for that year (see “Total” column at far right). Data are presented as they appear on the OSSE website; we were unable to determine the reason for the discrepancy. For these cases, percentages were calculated based on the sum across subgroups (not the total number from the far right column). SOURCE: Compiled from http://www.nclb.osse.dc.gov/index.asp [accessed April 2011]. 81
OCR for page 82
82 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS Looking Beyond Proficiency Rates The primary data point reported for DC CAS (as for many assessment programs) is the proficiency rate, the percentage of students who perform at or above the proficient level. However, using proficiency rates has more significant limitations than using measures that more accurately reflect the spread of scores, such as averages. One limitation is that states have widely varying definitions of proficiency in core subjects. For example, a study for the U.S. Department of Education (Bandeira de Mello et al., 2009) found that the difference between the most and least challenging state standards for proficient performance in reading and mathematics was as large as the difference between the basic and proficient performance levels on NAEP. This study did not include the District because data were not available, but it is possible to compare the percentage of students at or above proficient on NAEP to that of DC CAS during the same year: see Figure 5-2, above. The reasons for the differences in the tests may be that the DC CAS is more closely aligned to the District’s—not NAEP’s—standards and therefore mea- sures different things. It is also possible that the District, like many other states, has a lower bar for proficiency than does NAEP. Another limitation to consider about data on the percentage of stu- dents performing at or above the proficient level is that this figure provides no information about students who are performing significantly above or below that level. Thus, this measure cannot reveal change that occurs at all other points on the scale—such as students who move from below basic to basic or from proficient to advanced. If a school or the district as a whole has focused on helping the students who are performing just below the proficiency cutoff point to cross that cutoff (sometimes called bubble kids), other students might receive less attention (Booher-Jennings, 2005; Neal and Schanzenbach, 2007). Another and perhaps most important limitation is that the percent proficient statistic does not account for the weight (relative numbers of students) around the proficiency cut scores, and the fact that a slightly different choice in cut score may even reverse trends (Ho, 2008). Using proficiency scores to assess gains and gaps leads to “unrepresentative depic- tions of large-scale test score trends, gaps, and gap trends” and “incorrect or incomplete inferences about distributional change” (Ho, 2008, p. 1). Because of this limitation, analysts recommend statistics or summaries that accurately reflect the performance of all students, such as the average scaled scores and the distribution of these scores (Ho, 2008).
OCR for page 83
83 STUDENT ACHIEVEMENT UNDER PERAA Disaggregating Test Results Although average scores provide a measure of whole group perfor- mance, the average may mask important subgroup differences. For example, it is possible for the overall average to be increasing while some subgroup scores are decreasing. Alternatively, the average may not show a change, even though some subgroups’ scores are significantly increasing. Thus, dis- aggregating results is essential to understanding of score trends. A thorough evaluation of test scores in the District would examine how achievement has been changing across a number of student groups, considering: • grade level, • subject (and, in some cases, strands), • types of schools (e.g., charter or traditional), • student achievement levels (e.g., 10th, 25th, 50th, 75th, 90th percentiles), • geography (e.g., in the District, ward), • ethnicity, • income level, and • special populations, such as students with disabilities and English language learners. We also note that policies that change the standards for classifying English language learners have potentially significant effects on the charac- teristics of the whole population, and, therefore, on average performance. Students who move into the proficient category, for example, are often automatically reclassified as non-English language learners (even though they may not have attained complete fluency) and, thus, are no longer counted in the subgroup. In this situation, overall scores would appear to decrease simply because the composition of the tested group changed. Disaggregating data is complicated for DC because the city’s black population is large in comparison with that of many other school districts. Significant demographic differences within the city, including differences in levels of income and education, may therefore be obscured in analyses of achievement by racial group. DC’s unique population demographics make the black-white achievement gap less informative than comparisons within the demographic groups in the District and surrounding areas. Although there is little argument about the importance of striving to eliminate long-standing achievement gaps, it would be misleading to focus on such aggregate gaps within the District population as was done, for example, in the 2008-2009 progress report of the DC Public Schools (Dis- trict of Columbia Public Schools, 2009). The District’s black population
OCR for page 84
84 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS is very diverse, and includes both a concentration of very highly educated and successful black residents and many who are poorly educated and economically insecure. Socioeconomic differences are especially large be- tween the northwest and southeast areas of the city, whose populations are dominated, respectively, by well-off whites and poor blacks. For example, recently released data from the American Community Survey—aggregated from 2005 to 2009—show that in northwest Washington more than 80 percent of adults have at least a bachelor’s degree and more than 50 per- cent have at least a master’s degree, while in southeast Washington fewer than 10 percent have a bachelor’s degree. And in most areas of northwest Washington, the median household income is well over $100,000 per year, while in southeast Washington, the median household income is well under $50,000 (U.S. Census Bureau, 2010). It is highly misleading to compare academic achievement between populations of such different social and economic standing. Even in the absence of improved measures of individual students’ socioeconomic status (discussed below), when the new common core standards and common assessments become available, it should at least be possible to compare academic performance levels of white, black, and Hispanic students in the District with those in other, comparable student populations. In the mean- time, naïve aggregate comparison of test scores among race-ethnic groups in the District should be interpreted critically and cautiously. Thus, analysts need to carefully consider student backgrounds when comparing average scores, for example, by disaggregating by socioeconomic background. One way that is sometimes proposed to capture socioeconomic differ- ences is to use eligibility for the National School Lunch Program (which provides free or reduced-price lunch for income-eligible students), but re- search suggests that this is not in fact a valid proxy (Harwell and Lebeau, 2010). Students are eligible for the lunch program if their family incomes fall below 125 percent of the official federal poverty guideline (for free lunch) or between 125 percent and 175 percent of the poverty line (for reduced- price lunch). However, the program serves only those students who apply, and not all who are eligible apply. The percentages of students identified as low-income using the NAEP lunch program are lower than the percentages identified by Census Bureau data (Booher-Jennings, 2005). Another diffi- culty with using the lunch program data as a measure comes from changes in policies regarding eligibility. During the past decade, the program has been offered to the entire populations of schools that meet certain criteria, as well as to individual students in any school. Thus, in some cases individual students who do not meet the criteria actually participate in the program. Moreover, the federal definition of the poverty threshold has risen signifi- cantly less than the standard of living since the 1960s, so the official poverty designation has come to refer to a relatively more deprived segment of the
OCR for page 85
85 STUDENT ACHIEVEMENT UNDER PERAA population over time (see National Research Council, 1995). Because of these variations, eligibility for free or reduced-price lunch has limited value as a measure of socioeconomic status. Further research is needed to establish an improved measure of socioeconomic status that will capture differences in the District. We reiterate that DC NAEP results should be disaggregated by socio- economic status, as well as by race and ethnicity, to support meaningful inferences about student learning. Multiple methods should be used to track income level, such as parental education and home ownership status, as reported by parents or other responsible adults. The percentage of students tested (of all students enrolled) for DC CAS and the inclusion rates of English language learners and students with dis- abilities for NAEP are also factors that can affect population scores while masking subgroup scores. For example, if there were a significant decrease in the percentage of students tested, it could significantly affect test scores because the students most likely to be excluded are low-performing ones. For NAEP, state or district policies may differ on the inclusion or exclusion of students with disabilities or English language learners. If larger numbers of these students are excluded in one district or state in comparison with another, the test’s results for that state or district may be inflated. For the District, the percentage of students with disabilities or who are English language learners and were excluded from the NAEP assessments dropped from 2007 to 2009: in mathematics, the exclusion rate declined from 6 to 4 percent in grade 4 and from 10 to 6 percent in grade 8; in reading, the exclusion rate dropped from 14 to 11 percent in grade 4 and from 13 to 12 percent in grade 8. This decrease in the percentage of excluded students provides additional evidence that the assessment gains for District students are real in every NAEP assessment. Comparing Test Results Even individual student-level data will have significant limitations. Tracking students who leave the city is a challenge for the District, which has high rates of mobility to and from neighboring jurisdictions. It is also not generally possible to compare student performance across districts unless they use the same assessments (or ones that have significant overlap; see National Research Council, 2010b, for a discussion of cross-state compari- sons). Since the DC CAS is only administered to students in public schools in the District, it is not possible to assess whether students in DC are “catching up” over time with students outside of the system: one can only track the relative movement of DC students in comparison with one another. The DC State Board of Education voted in 2010 to adopt the common core standards, a set of standards in English language arts and mathematics
OCR for page 86
86 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS that have been developed cooperatively by the states and have been adopted by 40 other states.13 Since these standards are different from the current standards used for the DC CAS, a new set of assessments will be needed to replace the DC CAS. The District currently plans to adopt a new common assessment system that will align with the common core standards; such an assessment system is being developed by a multistate consortium.14 Once the new assessment system is operational, it will be possible to compare the progress of DC students with those in other jurisdictions, and thus to acquire additional evidence regarding changes in student performance since the pas- sage of PERAA. However, switching assessments also has disadvantages. If the DC CAS is not retained in some form for trend purposes, the District will no longer be able to compare current performance with that of the years prior to the implementation of a new assessment. It is possible to do a braided study (in which questions from the old test are nested within the new test) or to use the old test in a sample of schools for a few years to provide some in- formation on trends. Since, as we noted above, performance typically falls in the first year after a new test is introduced and then rapidly improves as teachers and students become familiar with the new format and new stan- dards, it will be important to take that into account in drawing conclusions about the results from a new test (see Koretz et al., 1991). A second issue we note is that assessment scores are part of DCPS’s teacher performance management system. There is considerable debate over pay-for-performance and the reliability of value-added measures; we note here only that attaching direct consequences to student test scores may provide an added incentive for teachers to focus on tested content, at the expense of other important educational goals, or even to cheat by offering students help or information they are not intended to have (see Jacob and Levitt, 2003; Lazear, 2006; National Research Council, 2010a). Comparing overall and disaggregated student performance on DC CAS and NAEP can help to provide a check on the integrity of results. REFERENCES Allensworth, E.M., and Easton, J.Q. (2007). What Matters for Staying OnTrack and Gradu ating in Chicago Public High Schools: A Close Look at Course Grades, Failures, and Attendance in the Freshman Year, Research Report. Chicago: Consortium on Chicago School Research at the University of Chicago. 13 Fordetails, see http://www.corestandards.org/in-the-states [accessed January 2010]. 14 Forthe DC government press release announcing the State Board’s adoption, see http:// newsroom.dc.gov/show.aspx/agency/seo/section/2/release/20261 [accessed March 2011].
OCR for page 87
87 STUDENT ACHIEVEMENT UNDER PERAA Bandeira de Mello, V., Blankenship, C., and McLaughlin, D. (2009). Mapping State Profi ciency Standards onto NAEP Scales: 20052007 (Research and Development Report, NCES 2010-456). Washington, DC: National Center for Education Statistics. Booher-Jennings, J. (2005). Below the bubble: “Educational triage” and the Texas account- ability system. American Educational Research Journal, 42(2), 231-268. District of Columbia Public Schools. (2009). Progress: Second Year of Reform. Washington, DC: Author. Available: http://www.dc.gov/DCPS/Files/downloads/ABOUT%20DCPS/ Strategic%20Documents/Progress%20Report%20-%202008-2009/DCPS-Annual- Report-7-21-2010-Full.pdf [accessed March 2011]. District of Columbia Public Schools. (2010). Learning Standards for Grades PreK8. Available: http://dcps.dc.gov/DCPS/In+the+Classroom/What+Students+Are+Learning/Learning+ Standards+for+Grades+Pre-K-8 [accessed October 2010]. Harwell, M., and LeBeau, B. (2010). Student eligibility for a free lunch as an SES measure in education research. Educational Researcher, 39(2), 120-131. Ho, A.D. (2008). The problem with “proficiency”: Limitations of statistics and policy under No Child Left Behind. Educational Researcher, 37(6), 351-360. Jacob, B.A., and Levitt, S.D. (2003). Catching cheating teachers: The results of an unusual experiment in implementing theory. In W.G. Gale and J. Rothenberg Pack (Eds.), BrookingsWharton Papers on Urban Affairs 2003 (pp. 185-209). Washington, DC: Brookings Institution Press. Koretz, D.M. (2008). Measuring Up: What Educational Testing Really Tells Us. Cambridge, MA: Harvard University Press. Koretz, D.M., Linn, R.L., Dunbar, S.B., and Shepard, L.A. (1991). The Effects of High Stakes Testing on Achievement: Preliminary Findings About Generalization Across Tests. Paper presented at the Annual Meetings of the American Educational Research Associa- tion (April 3-7) and the National Council on Measurement in Education (April 4-6), Chicago, IL. Lazear, E.P. (2006). Speeding, terrorism, and teaching to the test. The Quarterly Journal of Economics, 121(3), 1029-1061. Available: http://www.mitpressjournals.org/doi/ abs/10.1162/qjec.121.3.1029?journalCode=qjec [accessed March 2011]. Linn, R.L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4-16. Miller, R.T., Murnane, R.J., and Willett, J.B. (2007). Do Teacher Absences Impact Student Achievement? Longitudinal Evidence from One Urban School District. (NBER Working Paper No. 13356). Cambridge, MA: National Bureau of Economic Research. Available: http://www.nber.org/papers/w13356 [accessed March 2011]. National Research Council. (1995). Measuring Poverty: A New Approach. C.F. Citro and R.T. Michael (Eds.). Panel on Poverty and Family Assistance: Concepts, Information Needs, and Measurement Methods. Committee on National Statistics. Commission on Behav- ioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council. (1999). High Stakes: Testing for Tracking, Promotion, and Gradu ation. J.P. Heubert and R.M. Hauser (Eds.). Committee on Appropriate Test Use. Board on Testing and Assessment. Commission on Behavioral and Social Sciences and Educa- tion. Washington, DC: National Academy Press. National Research Council. (2010a). Getting Value Out of ValueAdded: Report of a Work shop. H. Braun, N. Chudowsky, and J. Koenig (Eds.). Committee on Value-Added Methodology for Instructional Improvement, Program Evaluation, and Accountability. Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
OCR for page 88
88 EVALUATING THE DISTRICT OF COLUMBIA’S PUBLIC SCHOOLS National Research Council. (2010b). State Assessment Systems: Exploring Best Practices and Innovations, Summary of Two Workshops. A. Beatty, Rapporteur. Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Stan- dards. Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. National Research Council and National Academy of Education. (2011). High School Drop out, Graduation, and Completion Rates: Better Data, Better Measures, Better Decisions. R.M. Hauser and J.A. Koenig (Eds.). Committee for Improved Measurement of High School Dropout and Completion Rates: Expert Guidance on Next Steps for Research and Policy Workshop. Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. Neal, D., and Schanzenbach, D.W. (2007). Left Behind by Design: Proficiency Counts and TestBased Accountability. (NBER Working Paper No. 13293). Cambridge, MA: National Bureau of Economic Research. Available: http://www.nber.org/papers/w13293.pdf [accessed March 2011]. Office of Technology Assessment. (1992). Testing in American Schools: Asking the Right Questions. Summary. Washington, DC: U.S. Government Printing Office. Available: http://govinfo.library.unt.edu/ota/Ota_1/DATA/1992/9236.PDF [accessed March 2011]. Office of the State Superintendent of Education. (2010a). DCCAS Grade 3 Performance Level Descriptors. Washington, DC: Author. Available: http://osse.dc.gov/seo/frames.asp?doc=/ seo/lib/seo/Grade_3_Performance_Level_Description.pdf [accessed November 2010]. Office of the State Superintendent of Education. (2010b). District of Columbia Comprehensive Assessment System: Resource Guide 2011. Washington, DC: Author. Available: http:// osse.dc.gov/seo/frames.asp?doc=/seo/lib/seo/assessment_and_accountability/2011_dc_ cas_resource_guide.pdf [accessed November 2010]. U.S. Census Bureau. (2010). 20052009 American Community Survey FiveYear Estimates. Available: http://factfinder.census.gov/servlet/DatasetMainPageServlet?_program=ACS&_ submenuId=&_lang=en&_ds_name=ACS_2009_5YR_G00_&ts= [accessed March 2011].