Read "Reporting District-Level NAEP Data: Summary of a Workshop" at NAP.edu

Page 19 Cite

Suggested Citation:"4 Comparisons with National Benchmarks: Pros and Cons." National Research Council. 2000. Reporting District-Level NAEP Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9768.

×

4 Comparisons with National Benchmarks: Pros and Cons

When Congress removed the language prohibiting the use of NAEP results below the state level (P.L. 103-382), the National Assessment Governing Board (NAGB) was called on to develop guidelines for the conduct of below-state reporting. Their document (National Assessment Governing Board, 1995a:1) states that “below state NAEP results could provide an important source of data for informing a variety of education reform efforts at the local level.” While “reform efforts” are not defined in the NAGB document, presumably such efforts would involve making comparisons of local performance with national, state, and other local results. State NAEP answered the persistent question asked by policy makers, “I know how we’re doing on our state test, but how are we doing in comparison to other states?” District-level NAEP results could serve a similar purpose for districts so long as item security is maintained and standardized administration practices are utilized.

Large urban districts often face educational challenges that suburban districts do not have to deal with. Urban districts tend to serve larger populations of children who typically score lower on standardized tests. They have larger populations of poor, immigrant, and unemployed families and larger populations of racial/ethnic minorities—all groups who typically score low (Donahue et al., 1999; Shaughnessy et al., 1997). When state assessment results are released, urban districts are often among the lowest performing (Education Week, 1998). Faced by the ever-critical press, district officials may respond by enumerating the many challenges they face

Page 20 Cite

Suggested Citation:"4 Comparisons with National Benchmarks: Pros and Cons." National Research Council. 2000. Reporting District-Level NAEP Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9768.

×

in educating their students. Many may believe that they are doing the best they can, given their student populations, but without appropriate comparisons, they cannot validate their arguments. For states that have multiple urban areas with common characteristics, results might be compared across similar districts using state assessments. However, many states do not have multiple urban areas. The most appropriate comparisons might be with other districts like themselves in other states.

Workshop participants reported that one of the most powerful uses of NAEP results is for making comparisons against a high-quality, national benchmark. They identified two broad categories of questions that might be answered by such comparisons:

How does our district compare with others like us? Which districts like ours are doing better than we are? What are districts like ours doing that works well?
How do our NAEP results compare to our local or state assessment results?

Speakers also identified a number of disadvantages and limitations associated with such comparisons. The discussion below attempts to summarize the major points made by the speakers.

COMPARISONS AMONG LIKE DISTRICTS COULD SERVE IMPORTANT PURPOSES

The most common argument made in favor of district-level results was the importance of being able to make comparisons among “like districts.” Sharon Lewis, director of research for the Council of Great City Schools, reported that the council recently took an “ unprecedented stand” by actively recruiting urban school districts to volunteer to take the proposed voluntary national tests. This action was prompted by council members’ desire to know how school districts are doing when measured against high standards and in comparison to other districts with similar characteristics. Lewis noted that urban school districts administer a number of commercially developed tests that allow them to answer questions about how well the district is doing. But these test results do not allow them to compare across districts, particularly to large urban districts in other states.

Other workshop participants echoed the desire for appropriate comparison groups. Thomas McIntosh, representing Nevada’s Department of

Page 21 Cite

Suggested Citation:"4 Comparisons with National Benchmarks: Pros and Cons." National Research Council. 2000. Reporting District-Level NAEP Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9768.

×

Education, remarked that comparisons would be useful if the relevant factors that influence performance could be controlled. He highlighted social and economic factors as important ones to be controlled and called for measures based on environment, cultural differences, number of books in the home, and parental expectations in addition to the more common measures based on the percentage of students receiving free and reduced-price lunches in districts. According to McIntosh, comparisons made after controlling for these social and economic factors would be useful in identifying who is doing well and what they are doing that works. He added that there is a need for comparisons that cannot be explained away by factors, such as differences in growth rates, size, or income (e.g., “you can’t compare us with them because we’re bigger” or “... because we’re growing faster” or “... because they have more money,” etc.). He noted that it is very easy to undermine comparisons and offer justifications and rationales for poor achievement. The largest district in his state, Clark County, is quite different from other districts in Nevada.

Gerald DeMauro, New York’s coordinator of assessment, agreed, saying that comparisons with like districts are important, but demographic information is needed in order to verify that the comparison is appropriate. The smaller the pool, the more important the characteristics of the pool. For DeMauro, the demographic characteristics of a city and those of a state can be strikingly different. Thus, comparisons of cities or districts that share common characteristics might be more meaningful than comparisons with the state as a whole.

Nancy Amuleru-Marshall, Atlanta’s executive director for research and assessment, presented her district ’s perspective, saying:

NAEP may represent the best effort so far in the development of rich and meaningful assessments. ... NAEP would provide districts with high-quality performance data that we currently do not have. It would permit districts to make peer comparisons, as well as state and national comparisons. Many of the districts that are members of the Council of Great City Schools have been struggling to find common measures of student achievement that are valid indicators of our students ’ performance. NAEP can provide such a measure.

Amuleru-Marshall added that Atlanta was one of the districts that stood behind President Clinton’s call for voluntary national testing and has been disappointed that the testing program has not been available to them yet.

Representatives from several state assessment offices also pointed out that the state is ultimately responsible for ensuring that school systems are

Page 22 Cite

Suggested Citation:"4 Comparisons with National Benchmarks: Pros and Cons." National Research Council. 2000. Reporting District-Level NAEP Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9768.

×

carrying out their charge of educating the state’s youth. An additional measure of the extent to which school systems are doing their jobs would be useful. Moreover, the ability to compare data for their urban districts with those in other states would help them set reasonable expectations for these jurisdictions.

EXTERNAL VALIDATION IS DESIRED

Workshop participants observed that another appealing feature of district-level reporting for NAEP would be the ability to compare district assessment results with stable external measures of achievement. According to Paul Cieslak, research specialist for the Milwaukee Public Schools, NAEP is a “good, well-constructed external validation measure that provides a solid base for longitudinal and out-of-district comparisons.” Others pointed out that there had been, and continue to be, revisions in their state assessment programs. NAEP remains consistent from one testing interval to the next, which makes it useful for providing trend data that are not possible with a changing state assessment system.

COMPARISONS CAN HAVE NEGATIVE CONSEQUENCES

A number of district representatives disagreed with the views that comparisons of like districts would provide useful information. They countered that large urban districts already know from currently administered tests that their students perform poorly on standardized assessments. They do not need to see themselves compared with another district in order to know this. “We already know we’re not doing well,” commented one district representative, “and another test on which we would score low would only fuel the fire for those more than ready to criticize us.”

Others added that districts have limited resources available, asking “Would district-level reporting be a good use of limited district resources?” They questioned whether the benefits would justify the costs, commenting that additional testing would consume instructional time and would utilize district funds.

CONTEXT FOR TESTING VARIES ACROSS STATES

Behuniak (Connecticut) pointed out another drawback with comparisons across state boundaries—while districts may seem comparable based

Page 23 Cite

Suggested Citation:"4 Comparisons with National Benchmarks: Pros and Cons." National Research Council. 2000. Reporting District-Level NAEP Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9768.

×

on their demographics, they may in fact be very different due to being located in a certain state. The state sets the context and the environment within which testing occurs. States differ in the emphases they place on test results, the uses of the scores, and the amounts and kinds of attention results receive from the press. These factors play a heavy role in setting the stage for the testing. Attempts to make comparisons across like districts need to consider the context for testing along with similarities in student populations.

COMPARISONS CAN CREATE A DOUBLE BIND

Speakers noted that attempts to obtain external validation can create a double bind. When the findings from external measures corroborate state assessment results, no questions are asked. However, when state or local assessment results and external measures (such as state NAEP) differ, assessment directors find themselves being asked, “Which set of results is correct?” Explaining and accounting for these differences can be challenging. One state assessment representative indicated that when state results are higher than NAEP results, he emphasizes the alignment of the state assessment with the curriculum. When state results are lower than NAEP, he points out that the state standards are higher.

Some state assessment programs have adopted the NAEP descriptors (advanced, proficient, and basic) for their achievement levels. However, their descriptions of performance differ in important ways from the NAEP descriptions. NAEP’s definition of “proficient,” for instance, may encompass different skills than the state’s definition of proficient. This difference creates problems for those who must explain and interpret the two sets of test results.

In addition, confusion arises when NAEP results are released at the same time as state or local assessment results. State and local results are timely, generally reporting data for a cohort while it is still in the particular grade. For instance, when reports are published on the achievement of a school system’s fourth graders, they represent the cohort currently in fourth grade. When NAEP results are published, they are for some previous year’s fourth graders. Users of the data (policy makers, the press, etc.) may attempt to compare cohorts across assessments, but when they realize that the results are for different cohorts, attention focuses on the more recent results; NAEP results may be dismissed. This time lag in reporting affects the extent to which NAEP can be a catalyst for change at the local level.

Reporting District-Level NAEP Data: Summary of a Workshop (2000)

Chapter: 4 Comparisons with National Benchmarks: Pros and Cons

4

Comparisons with National Benchmarks: Pros and Cons

COMPARISONS AMONG LIKE DISTRICTS COULD SERVE IMPORTANT PURPOSES

EXTERNAL VALIDATION IS DESIRED

COMPARISONS CAN HAVE NEGATIVE CONSEQUENCES

CONTEXT FOR TESTING VARIES ACROSS STATES

COMPARISONS CAN CREATE A DOUBLE BIND

Welcome to OpenBook!

Get Email Updates