Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 6
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP 2 Background This chapter provides background information on experiences with state NAEP and the reporting of district-level NAEP results. The first section describes some of the concerns expressed during the early implementation stages of state NAEP, discusses findings from initial evaluations of the program, and highlights their relationship to district-level reporting. The second section describes prior experiences NCES has had with reporting district-level results through the Trial District Assessment in 1996 and the reporting of results for naturally occurring districts in 1998. THE STATE NAEP EXPERIENCE The Trial State Assessment (TSA) was designed with several purposes in mind: (1) to provide states with information about their students ’ achievement and (2) to allow states to compare their students’ performance with that of other students in the states (National Academy of Education, 1993). Implementation was on a trial basis to allow for congressionally mandated evaluations of the program ’s feasibility and utility before committing resources to an ongoing state-by-state assessment. Prior to its implementation, a number of concerns were expressed about its possible impact. The text below describes some of these concerns, cites some of the benefits reported in reviews of the TSA, and notes how these concerns relate to district-level NAEP.
OCR for page 7
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP Early Concerns About Implementation of State NAEP Concerns about state NAEP centered around the anticipated uses of state-level data and the consequent effects on test preparatory behaviors. Reporting of national-level results had been regarded as having low stakes, since decisions at the state, district, school, or classroom level could not be based on NAEP reports. National-level data were not being used for accountability purposes, and participants were relatively unaffected by the results. But the provision of state-level data prompted concerns about the effects of increasing the stakes associated with NAEP. As enumerated by Stancavage et al. (1992:261) in discussing the TSA in mathematics, NAEP’s stakeholders asked: Would the reporting of the NAEP TSA cause local districts and states to change the curriculum or instruction that is provided to students? Would local or state testing programs change to accommodate NAEP-tested skills, would they remain as they are, or would they simply be pushed aside? Would any such changes in curriculum or assessment, should they occur, be judged as positive by mathematics educators, and others, or would the changes be viewed as regressive and counter-productive? Finally, would it be found that the entire NAEP TSA effort had no impact at all and was, therefore, a wasteful expenditure of time and money? These questions stemmed from concerns about the emphases attached to and the inferences drawn from NAEP results. Increasing the stakes associated with NAEP was seen as a move toward using NAEP results for accountability purposes. It was feared that such uses would degrade the value of the assessment. Koretz (1991:21) warned that higher stakes would bring inappropriate teaching to the test and inflated test scores, adding that NAEP results, so far, had been free from “this form of corruption.” While this is an important concern, it should also be noted that when state standards mirror the NAEP frameworks, having schools teach the content and skills assessed by NAEP is a desirable result. Beaton (1992:14) used the term “boosterism” to describe the activities that might be used to motivate students to do their best for the “state’s
OCR for page 8
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP honor.” He suggested that boosterism combined with teaching to the test and “more or less subtle ways of producing higher scores could effect the comparability of state trend data over time,” particularly if these practices change or become more effective over time. Others questioned how the results might be interpreted. For instance, Haertel (1991:436) pointed out that the first sort of questions asked would pertain to which states have the best educational systems, but cautioned that attempts to answer would be “fraught with perils. ” Haertel continued (p.437): [Comparisons] will involve generalizations from TSA exercise pools to a broader range of learning outcomes... [Such comparisons] depend on the match between NAEP content and states’ own curriculum framework... For example, a state pressing to implement the [National Council of Teachers of Mathematics] framework might experience a (possibly temporary) decrease in performance on conventional mathematics problems due to its deliberate decision to allocate decreased instruction time to that type of problem. The 1990 TSA might support the (valid) inference that the state’s performance on that type of problem was lagging, but not the (invalid) inference that their overall mathematics performance was lagging. It was expected that state-to-state comparisons would prompt the press and others to rank states, based on small (even trivial) differences in performance (Haertel, 1991). And, in fact, Stancavage et al. (1992) reported that in spite of cautions by NCES and Secretary Lamar Alexander not to rank states, four of the most influential newspapers in the nation rank-ordered states. In a review of 55 articles published in the top 50 newspapers, they found that state rankings were mentioned in about two-thirds of the articles (Stancavage et al., 1992). Another set of concerns pertained to the types of inferences that might be based on the background, environmental, and contextual data that NAEP collects. These data provide a wealth of information on factors that relateto student achievement. However, the data collection design does not support attributions of cause, nor does it meet the needs of accountability purposes. The design is cross-sectional in nature, assessing different samples of students on each testing occasion. Such a design does not allow for the before-and-after testing required to hold educators responsible for results. Furthermore, correlations of student achievement on NAEP with data about instructional practices obtained from the background information do not imply causal relationships. For example, the 1994 NAEP reading results showed that fourth grade students who received more than 90
OCR for page 9
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP minutes of reading instruction a day actually performed less well than students receiving less instruction. Clearly, the low-performing students received more hours of instruction to make up for deficiencies; the extra instruction did not causethe deficiencies (Glaser et al., 1997). Reported Benefits of State NAEP Despite these concerns about the provision of state-level data, reviews of the TSA have cited numerous benefits and positive impacts of the program. Feedback from state assessment officials indicated that state NAEP has had positive influences on instruction and assessment (Stancavage et al., 1992, 1993; Hartka and Stancavage, 1994; DeVito, 1997). At the time that the TSA was first implemented, many states were in the process of revamping their frameworks and assessments in both reading and mathematics. According to state officials, in states where changes were underway, the TSA served to validate the changes being implemented; in states contemplating changes, the TSA served as an impetus for change. Respondents to surveys conducted by Hartka and Stancavage (1994) reported that the following changes in reading assessment and instruction were taking place: increased emphasis on higher-order thinking skills; better alignment with current research on reading; development of standards-based curricula; increased emphasis on literature; and better integration or alignment of assessment and instruction. While these changes could not be directly attributed to the implementation of the TSA, they reflected priorities set for the NAEP reading assessment. Additionally, many state assessment measures were expanded to include more open-ended response items, with an increased emphasis on the use of authentic texts and passages, like those found on NAEP (Hartka and Stancavage, 1994). At the time of the first TSA, the new mathematics standards published by the National Council of Teachers of Mathematics (NCTM) were having profound effects on mathematics curricula, instructional practice, and assessment throughout the country. Survey results indicated that changes similar to those seen for reading were happening in mathematics instruction and assessment: alignment with the NCTM standards, increased emphasis on higher-order thinking skills and problem solving, development of standards-based curricula, and integration or alignment of assessment and instruction (Hartka and Stancavage, 1994). The mathematics TSA was also influential in “tipping the balance in favor of calculators (in the classroom and on assessments) and using sample items [for] teacher in-service train-
OCR for page 10
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP ing” (Hartka and Stancavage, 1994:431). Again, while these changes could not be attributed to the TSA, the fact that the NAEP mathematics frameworks were highly aligned with the NCTM standards served to reinforce the value of the professional standards. Results from the first TSA in 1990 garnered much attention from the media and the general public. For states whose performance was unsatisfactory, TSA results were helpful in spurring reform efforts. For states that had performed well on TSA, state officials could attribute the results to the recent reforms in their instructional practice and assessment measures. Relation to District-Level NAEP It appears from the reviews of the TSA that the expected negative consequences of state NAEP did not materialize and that positive impacts were realized. However, the move to reporting data for school districts brings the level of reporting much closer to those responsible for instruction. As the level of reporting moves to smaller units, the assessment stakes become even higher. Concerns similar to those described above for state-level data have been articulated for below-state reporting (Haney and Madaus, 1991; Selden, 1991; Beaton, 1992; Roeber, 1994). Haney and Madaus (1991) also caution that provision of district-level data could result in putting districts or schools into receivership; using results in school choice plans; or allocating resources on the basis of results. Furthermore, Selden (1991) points out that use of NAEP results at the district or school level has the potential to: discourage states’ and districts’ use of innovation in developing their own assessments; interfere with the national program with respect to test security—that is, keeping items secure would be more difficult and many new items would be needed; and increase costs in order to accomplish its goals. It will be important to keep these issues in mind as district-level NAEP is being considered. EXPERIENCES WITH DISTRICT-LEVEL NAEP The Improving America’s Schools Act of 1994, which reauthorized NAEP in that year, modified the policies that guide NAEP’s reporting practices. This legislation removed the language prohibiting “below-state” reporting of NAEP results. One means for providing below-state results is through summarizing performance at the school district level. The initiative for providing below-state reporting was supported by the National As-
OCR for page 11
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP sessment Governing Board (NAGB) in the hope that school districts would choose to use NAEP data to inform a variety of education reform initiatives at the local level (National Assessment Governing Board, 1995a). During the 1996 and 1998 administrations of NAEP, different procedures for offering district-level NAEP data to districts and states were explored. The two plans, the Trial District Assessment offered in 1996 and the Naturally-Occurring District plan offered in 1998, are described below. Trial District Assessment Under the Trial District Assessment, large school districts were offered three options for participating in district-level reporting of NAEP (National Center for Educational Statistics, 1995a). The first option, called “Augmentation of State NAEP Assessment,” offered district-level results in the same subjects and grades as in state NAEP by augmenting the district’ s portion of the state NAEP sample. Under this option, districts would add “a few schools and students” to their already selected sample in order to be able to report stable estimates of performance at the district level. According to the National Center for Educational Statistics (NCES), the procedures for augmenting the sample would “minimize the cost of the assessment process,” and costs were to be paid by the district. The second option in 1996, referred to as “Augmentation of National Assessment,” would allow for reporting district results in subjects and grades administered as part of national NAEP, by augmenting the number of schools selected within certain districts as part of the national sample. As few schools are selected in any single district for national NAEP, this second option would require most school districts to select “full samples of schools” (National Center for Education Statistics, 1995b:2) in order to meet the sampling requirements and to report meaningful results. The cost for augmenting the national sample would be more substantial than those associated with augmenting the state sample. If a district selected either of these options, the procedures for sample selection, administration, scoring, analysis, and reporting would follow those established for national or state NAEP, depending on the option selected. And the results would be “NAEP comparable or equivalent.” The third option in 1996, the “Research and Development” option, was offered to districts that might not desire NAEP-comparable or equivalent results but that had alternative ideas for using NAEP items. Alternative usage might be assessing a subject or subjects not being administered by
OCR for page 12
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP NAEP at the national or state level; administering only a portion of the NAEP instrument; or including a deviation from standard NAEP procedures. NCES would regard such uses as research and development activities and would not certify the results obtained under this option as NAEP comparable or equivalent. Prior to the 1996 administrations, NCES (with the assistance of the sampling contractor, Westat) determined that the minimum sampling requirements for analysis and reporting at the district level were 25 schools and 500 assessed students per grade and subject. NCES and the Educational Testing Service (ETS) sponsored a meeting during the annual meeting of the American Educational Research Association, inviting representatives from several of the larger districts in the country. On the basis of conversations at this meeting and further interaction with district representatives, NCES identified approximately 10 school systems that were interested in obtaining NAEP results for their districts. NCES and their contractors held discussions with representatives of these districts. The costs turned out to be much higher than school systems could easily absorb. Due mainly to fiscal concerns, only Milwaukee participated in 1996, with financial assistance from the National Science Foundation. Additional sampling of schools and students was required for Milwaukee to reach the minimum numbers necessary for participation, and they received results only for grade 8. Naturally Occurring Districts Prior to the 1998 administrations, NCES and Westat determined that there were six naturally occurring districts. Naturally occurring districts are those that comprise at least 20 percent of their state ’s sample and thus meet the minimum sampling requirements described above (25 schools and 500 students) as a matter of course. These districts can be thought of as “self-representing in state NAEP samples ” (Rust, 1999). The districts that met these guidelines in 1998 were: Albuquerque, New Mexico; Anchorage, Alaska; Chicago, Illinois; Christiana County, Delaware; Clark County, Nevada; and New York City, New York.
OCR for page 13
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP In July 1998, NCES contacted representatives from these naturally occurring districts to assess their interest in district-level reports, informing them that such results could be generated at no additional cost to the state or the district. Alaska did not participate in 1998, and Christiana County decided it was not interested. In the cases of New York City and Chicago, the districts did not want the data although the respective states did, thereby creating a conflict. The NAEP State Network, which consists of state assessment directors or their appointed representatives, also voiced concerns about the fairness of making the data available for some districts but not others. NCES did not query Clark County or Albuquerque, or their respective states, as to their interest, since by then the whole idea of district-level reporting was coming into question (Arnold Goldstein, National Center for Education Statistics, personal communication, 1999).
Representative terms from entire chapter: