Page 30

3

Reporting District-Level NAEP Results

The Improving America's Schools Act of 1994, which reauthorized NAEP in that year, eliminated the prohibition against reporting NAEP results below the state level. Although the law removed the prohibition, it neither called for district- or school-level reporting, nor did it outline details about how such practices would operate. NAGB and NCES have explored reporting district-level results as a mechanism for providing more useful and meaningful NAEP data to local policy makers and educators. They have twice experimented with trial district-level reporting programs. For a variety of reasons, neither attempt revealed much interest on the part of school districts. The lack of interest was attributable, in part, to financial considerations and to unclear policy about whether the state or the district had the ultimate authority to make participation decisions. Despite the apparent lack of interest during the attempted trial programs, there is some evidence that provision of district-level results could be a key incentive to increasing schools' and districts' motivation to participate in NAEP (Ambach, 2000).

The focus of the committee's work on district-level reporting was to evaluate the desirability, feasibility, potential uses, and likely impacts of providing district-level NAEP results. In this chapter, we address the following questions: (1) What are the proposed characteristics of a district-level NAEP? (2) If implemented, what information needs might it serve? (3) What is the degree of interest in participating in district-level NAEP? (4) What factors would influence interest?



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 30
Page 30 3 Reporting District-Level NAEP Results The Improving America's Schools Act of 1994, which reauthorized NAEP in that year, eliminated the prohibition against reporting NAEP results below the state level. Although the law removed the prohibition, it neither called for district- or school-level reporting, nor did it outline details about how such practices would operate. NAGB and NCES have explored reporting district-level results as a mechanism for providing more useful and meaningful NAEP data to local policy makers and educators. They have twice experimented with trial district-level reporting programs. For a variety of reasons, neither attempt revealed much interest on the part of school districts. The lack of interest was attributable, in part, to financial considerations and to unclear policy about whether the state or the district had the ultimate authority to make participation decisions. Despite the apparent lack of interest during the attempted trial programs, there is some evidence that provision of district-level results could be a key incentive to increasing schools' and districts' motivation to participate in NAEP (Ambach, 2000). The focus of the committee's work on district-level reporting was to evaluate the desirability, feasibility, potential uses, and likely impacts of providing district-level NAEP results. In this chapter, we address the following questions: (1) What are the proposed characteristics of a district-level NAEP? (2) If implemented, what information needs might it serve? (3) What is the degree of interest in participating in district-level NAEP? (4) What factors would influence interest?

OCR for page 30
Page 31 STUDY APPROACH To gather information relevant to these questions, the committee reviewed the literature that has been written about below-state reporting, including NCES and NAGB policy guidelines for district-level reporting (National Assessment Governing Board, 1995a; National Assessment Governing Board, 1995b; National Center for Education Statistics, 1995); listened to presentations by representatives from NAGB, NCES, and their contractors (ETS and Westat) regarding district-level reporting; and held a workshop on district-level reporting. During the workshop, representatives of NAGB and NCES discussed policy guidelines, prior experiences, and future plans for providing district-level data. Representatives from ETS and Westat spoke about the technical issues associated with reporting district-level data. Individuals representing state and district assessment offices participated and commented on their interest in and potential uses for district-level results. Representatives from national organizations (Council of Chief State School Officers and Council of Great City Schools) and authors of papers on providing below-state NAEP results served as discussants at the workshop. Approximately 40 individuals participated in the workshop. Workshop proceedings were summarized and published (National Research Council, 1999c). This chapter begins with a review of the concerns expressed when state NAEP was first implemented, as they could all relate to below-state reporting. This section contains a description of the evaluations of the Trial State Assessment, the findings of the evaluations, and the reported benefits of state NAEP. The chapter continues with a summary of the chief issues raised by authors who have explored the advantages and disadvantages of providing below-state results. In the next portion of this chapter, the two experiences with district-level reporting are described. The first of these experiences is associated with the 1996 assessment, and the other is associated with the 1998 assessment. A summary of the information obtained during the committee's workshop on district-level reporting is presented in the final portion of this chapter. INITIAL CONCERNS FOR STATE-LEVEL REPORTING Prior to implementation of the Trial State Assessment (TSA) and reporting of state-level results, researchers and others familiar with NAEP expressed concerns about the expansion of the assessment to include state-

OCR for page 30
Page 32 level data. These concerns centered around the anticipated uses of state-level data and the likely effects on curriculum and instruction. National NAEP had been a low-stakes examination, since data could not be used for decisions at the state, district, school, or classroom level. National-level data were not being used for accountability purposes, and participants were relatively unaffected by the results. With the provision of state-level results, some expressed concern that the stakes associated with NAEP could rise. Specifically, observers questioned if the reporting of the TSA would cause local districts and states to change the curriculum or instruction that is provided to students. They also questioned if local or state testing programs would change to accommodate NAEP-tested skills or would simply be pushed aside. Observers also debated whether any changes in curriculum or assessment would be positive or counterproductive (Stancavage, Roeber, & Bohrnstedt, 1992:261). These questions stemmed from concerns about the emphases given NAEP results. As long as NAEP was a low-stakes test and decisions did not rest on the results, it was unlikely that states and districts would adjust their curriculum or assessments based on the results. But reporting results at the state level could increase pressure on states to change their instructional practices, which could threaten the validity of NAEP scores (Koretz 1991:21). Furthermore, Koretz warned that changes in instructional practices could harm student learning. To the degree that NAEP frameworks represent the full domain of material students should know, planning instruction around the frameworks may be appropriate. However, if schools “teach to the test,” meaning that they teach only a narrow domain covered by the assessment, then they have inappropriately narrowed the curriculum. Beaton (1992:14) used the term “boosterism” to describe the activities that might be used to motivate students to do their best for the “state's honor.” He suggested that boosterism combined with teaching to the test and “more or less subtle ways of producing higher scores” could affect the comparability of state trend data, if these practices change or become more effective over time. Others questioned how the results might be interpreted. For instance, Haertel (1991:436) pointed out that the first sorts of questions asked will pertain to which states have the best educational systems but cautioned that attempts to answer would be “fraught with perils.” Haertel continued (p.437):

OCR for page 30
Page 33 [Comparisons] will involve generalizations from TSA exercise pools to a broader range of learning outcomes . . . [Such comparisons] depend on the match between NAEP content and states' own curriculum framework . . . For example, a state pressing to implement the [National Council of Teachers of Mathematics] framework might experience a (possibly temporary) decrease in performance on conventional mathematics problems due to its deliberate decision to allocate decreased instruction time to that type of problem. The 1990 TSA might support the (valid) inference that the state's performance on that type of problem was lagging, but not the (invalid) inference that their overall mathematics performance was lagging. Haertel (1991) also expected that state-to-state comparisons would prompt the press and others to rank states, based on small (even trivial) differences in performance. In fact, Stancavage et al. (1992) reported that in spite of cautions by NCES and Secretary of Education Lamar Alexander not to rank states, four of the most influential newspapers in the nation did so. In a review of 55 articles published in the top 50 newspapers, they found that state rankings were mentioned in about two-thirds of the articles (Stancavage et al., 1992). Other concerns pertained to the types of inferences that NAEP's various audiences might draw based on the background, environmental, and contextual data that are reported. These data provide a wealth of information on factors that relate to student achievement. However, the data collection design does not support inferences that these factors caused the level of achievement students attained nor does it meet the needs of accountability purposes. The design is cross sectional in nature, assessing different samples of students on each testing occasion. Such a design does not allow for the before-and-after data required to hold educators responsible for results. Furthermore, correlations of student achievement on NAEP with data about instructional practices obtained from the background information do not imply causal relationships. For example, the 1994 NAEP reading results showed that fourth-grade students who received more than 90 minutes of reading instruction a day actually performed worse than students receiving less instruction. Clearly, the low-performing students received more hours of instruction as a result of their deficiencies; the extra instruction did not cause the deficiencies (Glaser, Linn, & Bohrnstedt, 1997). Benefits Associated with State NAEP Despite these concerns about the provision of state-level data, reviews

OCR for page 30
Page 34 of the TSA have cited numerous benefits and positive impacts of the program. Feedback from state assessment officials indicated that state NAEP has had positive influences on instruction and assessment (Stancavage et al., 1992; Stancavage, Roeber, & Bohrnstedt, 1993; Hartka & Stancavage, 1994; DeVito, 1997). When the TSA was first implemented, many states were in the process of revamping their frameworks and assessments in both reading and mathematics. According to state officials, in states where changes were under way, the TSA served to validate the changes being implemented; in states contemplating changes, the TSA served as an impetus for change. Respondents to surveys conducted by Stancavage and colleagues (Hartka & Stancavage, 1994) reported that the following changes in reading assessment and instruction were taking place: increased emphasis on higher-order thinking skills; better alignment with current research on reading; development of standards-based curricula; increased emphasis on literature; and better integration or alignment of assessment and instruction. Although these changes could not be directly attributed to the implementation of the TSA, they reflected priorities also set for the NAEP reading assessment. In addition, many state assessment measures were expanded to include more open-ended response items, with an increased emphasis on the use of authentic texts and passages, like those found on NAEP (Hartka & Stancavage, 1994). At the time of the first TSA, the new mathematics standards published by the National Council of Teachers of Mathematics (NCTM) were having profound effects on mathematics curricula, instructional practice, and assessment throughout the country (Hartka & Stancavage, 1994). Survey results indicated that changes similar to those seen for reading were occurring in mathematics instruction and assessment: alignment with the NCTM standards, increased emphasis on higher-order thinking skills and problem solving, development of standards-based curricula, and integration or alignment of assessment and instruction (Hartka & Stancavage, 1994). The mathematics TSA was also influential in “tipping the balance in favor of calculators (in the classroom and on assessments) and using sample items [for] teacher in-service training” (Hartka & Stancavage, 1994:431). Again, although these changes could not be attributed to the TSA, the NAEP mathematics frameworks' alignment with the NCTM standards served to reinforce the value of the professional standards. In 1990, results from the first TSA in 1990 garnered much attention from the media and the general public. For states with unsatisfactory per-

OCR for page 30
Page 35 formance, TSA results were helpful in spurring reform efforts. For states with satisfactory TSA performance, state officials could attribute the results to the recent reforms in their instructional practice and assessment measures. LITERATURE ON BELOW- STATE REPORTING In “The Case for District- and School-Level Results from NAEP,” Selden (1991) made the seemingly self-evident argument that having information is better than not having it, saying (pg. 348), “most of the time, information is useful, and the more of it we have, the better, as long as the information is organized and presented in a way that [makes] it useful.” Selden claimed that because NAEP is conducted and administered similarly across sites (schools), it offers comparable information from site to site, thus allowing state-to-state or district-to-district comparisons. He finds that NAEP's ability to collect high quality data comparably over time and across sites lends it to powerful uses for tracking both student achievement and background information. According to Selden, questions that might be addressed by trend data include: are instructional practices changing in the desired directions; are the characteristics of the teacher workforce getting better; and are home reading practices improving. He explained that schools and districts could use trend information to examine their students' achievement in relation to instructional methods. While Selden presented arguments in favor of providing below-state-level results, he and others (Haney and Madaus, 1991; Beaton, 1992; Roeber, 1994) also cautioned that reporting results below the state level could lead to a host of problems and misuses. Their arguments emphasized that, although having more information could be viewed as better than having less information, it is naïve to ignore the uses that might be made of the data. Indeed, Selden (1991:348) pointed out that one fear is that new information will be “misinterpreted, misused, or that unfortunate, unforeseen behavior will result from it.” Reports of below-state NAEP results could easily become subject to inappropriate high-stakes uses. For example, results could be used for putting districts or schools into receivership; making interdistrict and interschool comparisons; using results in school choice plans; holding teachers accountable; and allocating resources on the basis of results (Haney and Madaus, 1991). In addition, some authors worried that NAEP's use as a high-stakes accountability device at the local level could lead to teaching to the test and distortion of the curriculum (Selden,

OCR for page 30
Page 36 1991, Beaton, 1992). Selden (1991) further argued that the use of NAEP results at the district or school level has the potential to discourage states and districts from being innovative in developing their own assessments. Potential high-stakes uses of NAEP would heighten the need for security. Item development would need to be stepped up, which would raise costs (Selden, 1991). NAGB, NCES, the NAEP contractors, and participating school district staff, would also have to coordinate efforts to ensure that the NAEP assessments are administered in an appropriate manner. According to Roeber (1994:42), such overt action would be needed “to assure that reporting does not distort instruction nor negatively impact the validity of the NAEP results now reported at the state and national levels.” EXPERIENCES WITH DISTRICT-LEVEL REPORTING NAGB and NCES supported the initiative to provide district-level results, hoping that school districts would choose to use NAEP data to inform a variety of education reform initiatives at the local level (National Assessment Governing Board, 1995a; National Assessment Governing Board, 1995b). With the lifting of the prohibition against below-state reporting, NAGB and NCES explored two different procedures for offering district-level NAEP data to districts and states: the Trial District Assessment, offered in 1996, and the Naturally-Occurring District Plan, offered in 1998. The 1996 Experience: Trial District Assessment Under the Trial District Assessment, large school districts were offered three options for participating in district-level reporting of NAEP (National Center for Educational Statistics, 1995). The first option, “Augmentation of State NAEP Assessment,” offered district-level results in the same subjects and grades as in state NAEP by augmenting the district's portion of the state NAEP sample. Under this option, districts would add “a few schools and students” to their already selected sample in order to report stable estimates of performance at the district level. According to the NCES, the procedures for augmenting the sample would “minimize the cost of the assessment process,” and costs were to be paid by the district. The second option in 1996, “Augmentation of National Assessment,” would allow for reporting district results in subjects and grades administered as part of national NAEP by augmenting the number of schools

OCR for page 30
Page 37 selected within certain districts as part of the national sample. Because few schools are selected in any single district for national NAEP, this second option would require most school districts to select “full samples of schools” (National Center for Education Statistics, 1995:2) to meet the sampling requirements and to report meaningful results. The cost for augmenting the national sample for participating districts would be more substantial than those associated with augmenting the state sample. If a district selected either of these options, the procedures for sample selection, administration, scoring, analysis, and reporting would follow those established for national or state NAEP, depending on the option selected. And the results would be “NAEP comparable or equivalent.” The third option in 1996, “Research and Development,” was offered to districts that might not desire NAEP-comparable or equivalent results but that had alternative ideas for using NAEP items. For example, districts might assess a subject or subjects not assessed by NAEP at the national or state level; they might want to administer only a portion of the NAEP instrument; or they might choose to deviate from standard NAEP procedures. NCES would regard such uses as research and development activities and would not certify the results obtained under this option as NAEP comparable or equivalent. Prior to the 1996 administrations, NCES (with the assistance of the sampling contractor, Westat) determined that the minimum sampling requirements for analysis and reporting at the district level were 25 schools and 500 assessed students per grade and subject. To gauge interest in the plan, NCES and ETS sponsored a meeting during the 1995 annual meeting of the American Educational Research Association, inviting representatives from several of the larger districts in the country. Based on this meeting and further interaction with district representatives, NCES identified approximately 10 school systems interested in obtaining their NAEP results. NCES and their contractors held discussions with representatives of these districts. The costs turned out to be much higher than school systems could easily absorb (National Research Council, 1999c). Consequently, only Milwaukee participated in 1996, with financial assistance from the National Science Foundation. Additional sampling of schools and students was required for Milwaukee to reach the minimum numbers necessary for participation, and they received results only for grade eight.

OCR for page 30
Page 38 Milwaukee's Experience under the Trial District Assessment In the spring of 1996, NAEP was administered to a sample of Milwaukee's school population, and results were received in September 1997. NCES prepared a special report for the school district summarizing performance overall and by demographic, environmental, background, and academic characteristics. Explanatory text accompanied the tabular reports. Paul Cieslak, former research specialist with the Milwaukee school district, attended the committee's workshop and described the uses made of the reported data. According to Cieslak, the report was primarily used as part of a day-long training session with 45 math/science resource teachers, under the district's NSF Urban Systemic Mathematics/Science Initiative to help the teachers work with project schools (Cieslak, 2000). The teachers found the overall performance and demographic information moderately helpful. The reports summarizing performance by teaching practices and by background variables and institutional practices were more useful and interesting. Milwaukee officials found that the NAEP results generally supported the types of instructional practices they had been encouraging. According to Cieslak (2000), the School Environmental data “increased the value of the NAEP reports tenfold” since districts do not have the time or the resources to collect these data. This information helped school officials to look at relationships among classroom variables and performance. Cieslak believed that availability of the School Environmental data could be one of the strongest motivating factors behind districts' interest in participation. While no specific decisions were based on the data, Cieslak believed that was primarily because so much attention is focused on their state and local assessments, especially those included in the district's accountability plan. In Milwaukee, the various assessment programs compete for attention, and the statewide assessments usually win out. Cieslak believes that state assessments will continue to receive most of the attention unless some strategies are implemented to demonstrate specifically how NAEP data are related to national standards, specific math/science concepts, or district goals. The1998 Experience: Naturally Occurring Districts Prior to the 1998 NAEP administration, NCES and Westat determined that there were six “naturally occurring districts” in state samples.

OCR for page 30
Page 39 They defined naturally occurring districts as those that comprise at least 20 percent of the state's sample and that meet the minimum sampling requirements for analysis and reporting at the district level (25 schools and 500 assessed students per grade and subject). These districts can be thought of as “self-representing in state NAEP samples” (Rust, 1999). The districts that met these guidelines in 1998 were Albuquerque, New Mexico; Anchorage, Alaska; Chicago, Illinois; Christiana County, Delaware; Clark County, Nevada; and New York City, New York. In July 1998, NCES contacted district representatives to assess their interest in receiving district-level NAEP results at no additional cost. They found no takers. Alaska did not participate in 1998, and Christiana County expressed no interest. District representatives in New York City and Chicago did not want the data. Gradually, the idea of providing district-level reports grew increasingly controversial. The NAEP State Network, which consists of state assessment directors or their appointed representatives, voiced concerns about the fairness of making the data available for some districts but not others. NCES did not query Clark County or Albuquerque, or their respective states, as to their interest, since by then the idea of district-level reporting was being questioned (Arnold Goldstein, National Center for Education Statistics, personal communication, October 1999). Controversy arose concerning who would make participation and release decisions for a district-level NAEP. Although New York and Chicago did not want the data, their respective states did, thereby creating a conflict. NAGB discussed the issue at its August 1999 meeting and decided that no further offers of district results should be made until it was clear who should be the deciding entity (National Assessment Governing Board, 1999d). TECHNICAL AND POLICY CONSIDERATIONS FOR DISTRICT-LEVEL REPORTING As part of the workshop on district-level reporting, the committee asked representatives from NAGB, NCES, ETS, and Westat to discuss the technical issues related to sampling and scoring methodologies and the policy issues related to participation and reporting decisions. The text below summarizes the information provided by NAEP's sponsors and contractors.

OCR for page 30
Page 40 Proposed Sampling Design for Districts In preparation for the workshop, NCES and Westat provided two documents that outlined the proposed sampling plans for district-level reporting (Rust, 1999; National Center for Education Statistics, 1995). For state NAEP, the sample design involves two-stage stratified samples. Schools are selected at the first stage, and students are selected at the second stage. The typical state sample size is 3,000 students per grade and subject, with 30 students per school. The sample sizes desired for district results would be roughly one-quarter that required for states (750 sampled students at 25 schools, to yield 500 participants at 25 schools). This sample size would be expected to produce standard errors for districts that are about twice the size of standard errors for the state. Districts that desired to report mean proficiencies by background characteristics—such as race, ethnicity, type of courses taken, home-related variables, instructional variables, and teacher variables—would need sample sizes approximately one-half of their corresponding state sample sizes, or approximately 1,500 students from a minimum of 50 schools. For reporting, the “rule of 62” would apply, meaning that disaggregated results would be provided only for groups with at least 62 students (National Assessment Governing Board, 1995b: Guideline 3). At the workshop, Richard Valliant, associate director of Westat's Statistical Group, further outlined the sampling requirements for districts. Valliant described the “sparse state” option, that would require fewer schools but would sample more students at the selected schools, and the “small state” option, that would reduce the number of students tested per school. Both options would still require 500 participating students. These sample sizes would allow for the reporting of scaled scores, achievement levels, and percentages of students at or above a given level for the entire district, but would probably not allow for stable estimates of performance for subgroups of the sample. Peggy Carr, associate commissioner in the Assessment Division at NCES, described two additional alternatives under consideration, the “enhanced district sampling plan” and the “analytic approach.” The enhanced district sampling plan would reconfigure the state sampling design so that sufficient numbers of schools were sampled for interested districts. This plan might require oversampling at the district level and applying appropriate weights to schools, and perhaps districts, during analysis. The analytic approach, according to Carr, would allow districts to access existing

OCR for page 30
Page 41 data in order to identify districts like themselves and compare analytic results. Carr noted that development of details about this option were still under way. Scoring Methodolgy During the workshop, Nancy Allen, director of NAEP analysis and research at ETS, described the scoring methodology currently used for NAEP and explained how procedures would be adapted to generate district-level results. Allen reminded participants that ability estimates are not computed for individuals because the number of items to which any given student responds is insufficient to produce a reliable performance estimate. She described procedures used to generate the likely ability distributions for individuals, based on their background characteristics and responses to NAEP items (the conditioning procedures), and to randomly draw five ability estimates (plausible values) from these distributions. She noted that for state NAEP, the conditioning procedures utilize information on the characteristics of all test takers in the state. Participants and committee members raised questions about the information that would be included in the conditioning models for districts. For example, would the models be based on the characteristics of the state or the characteristics of the district? If models were based on the characteristics of the state, and the characteristics of the state differed from those of the district, would that affect the estimates of performance? Allen responded that the conditioning models rely on information about the relationships (covariation) between performance on test items and background characteristics. According to Allen, sometimes the compositional characteristics of the state and a district will differ with respect to background variables, but the relationships between cognitive performance and background characteristics may not differ. Nevertheless, Allen stressed that they were still exploring various models for calculating estimates at the district level, including some that condition on district characteristics. Given the potential bias in proficiency estimates that could result from a possibly erroneous conditioning model, the committee offers the following recommendation regarding conditioning procedures. RECOMMENDATION 3-1: If the decision is made to move forward with providing district-level results, NAEP's sponsors should collect empirical evidence on the most appropriate pro-

OCR for page 30
Page 42 cedures for improving the accuracy of estimates of achievement using demographic and background variables (conditioning and plausible values technology). Conditioning is most defensible when based on district-level background variables. Empirical evidence should be gathered before selecting an alternate procedure, supporting its acceptability. Participation Decisions Roy Truby, executive director of NAGB, told participants that when Congress lifted the ban on below-state reporting, it neglected to include language in the law that clarified the roles of states and districts in making participation decisions. In 1998, when NCES offered results to the naturally occurring districts, the agency sent letters to both the districts and their respective states. Based on legal advice from the Department of Education's Office of General Counsel, the agency determined that state officials, not district officials, would make decisions about release of results. In at least one case, there appeared to be a conflict in which the state wanted the data released, but the district did not. NAGB members were concerned that the districts were not told when they agreed to participate in 1998 NAEP that results for their districts might be released. Because of this ambiguity about decision-making procedures, NAGB passed the following resolution (National Assessment Governing Board, 1999d): Since the policy on release of district-level results did not envision a disagreement between state and district officials, the Governing Board hereby suspends implementation of this policy, pending legislation which would provide that the release of district-level NAEP results must be approved by both the district and state involved. The committee asked workshop participants to discuss their opinions about the entity (states or districts) that should have decision-making authority over participation and release of data. In general, district representatives believed that the participating entity should make participation decisions, while state representatives believed that the decision should lie with the state. Others thought that the entity that paid for participation should have decision-making authority. However, speakers stressed that the most pertinent issue was not about participation but about public release of results. Under the Freedom of Information Act, district results would be subject to public release once they were compiled.

OCR for page 30
Page 43 REACTIONS FROM WORKSHOP PARTICIPANTS Workshop participants discussed technical and policy issues for district-level NAEP and made a number of observations. They are discussed next. Comparisons Among Similar Districts Like Selden (1991), some workshop participants found that district-level reporting would enable useful and important comparisons. Several state and district officials liked the idea of being able to make comparisons among similar districts. District officials reported that often others in the state do not understand the challenges they face, and comparisons with similar districts across state boundaries would enable them to evaluate their performance given their particular circumstances. For instance, some districts are confronting significant population growth that affects their available resources. Others, such as large urban districts, have larger populations of groups that tend to perform less well on achievement tests. District officials believed that if performance could be compared among districts with similar characteristics, state officials might be more likely to set more reasonable and achievable expectations. Further, they noted that this practice might allow them to identify districts performing better than expected, given their demographics, and attention could focus on determining instructional practices that work well. A number of workshop participants were worried about the uses that might be made of district-level results. Some expressed concern that results would be used for accountability purposes and to chastise or reward school districts for their students' performance. Using district-level results as part of accountability programs would be especially problematic if the content and skills covered by NAEP were not aligned with local and state curricula. Officials from some of the larger urban areas also argued that they already know that their children do not perform as well as students in more affluent suburban districts. Having another set of assessment results would provide yet another opportunity for the press and others to criticize them. Other state and district officials commented that states' varied uses of assessments may confound comparisons. While districts may seem comparable based on their demographics, they may in fact be very different, because of the context associated with state assessment programs. States differ in the emphases they place on test results, the uses of the scores, and the

OCR for page 30
Page 44 amounts and kinds of attention results receive from the press. These factors play a significant role in setting the stage for the testing and can make comparisons misleading, even when districts appear similar because of their student populations. External Validation Some state and district officials were attracted to the prospect of having a means for external validation. They find NAEP to be a stable external measure of achievement against which they could compare their state and local assessment results. However, some also noted that attempts to obtain external validation for state assessments can create a double bind. When the findings from external measures corroborate state assessment results, no questions are asked. However, when state or local assessment results and external measures (such as state NAEP) differ, assessment directors are often asked, “Which set of results is correct?” Explaining and accounting for these differences can be challenging. Having multiple indicators that suggest different findings can lead to public confusion about students' achievement. These challenges are particularly acute when a state or local assessment is similar, but not identical, to NAEP. For example, some state assessment programs have adopted the NAEP descriptors (advanced, proficient, and basic) for their achievement levels. However, their descriptions of performance differ in important ways from the NAEP descriptions. NAEP's definition of “proficient,” for instance, may encompass different skills than the state's definition, creating problems for those who must explain and interpret the two sets of test results. Some district and state officials expressed concern about the alignment between their curricula and the material tested on NAEP. Their state and local assessments are part of an accountability system that includes instruction, assessment, and evaluation. NAEP results would be less meaningful if they were based on content and skills not covered by their instructional programs. Attempts to use NAEP as a means of external validation for the state assessment is problematic when the state assessment is aligned with instruction and NAEP is not, particularly if results from the different assessments suggest different findings about students' achievement. In addition, confusion arises when NAEP results are released at the same time as state or local assessment results. State and local results are timely, generally reporting data for a cohort while it is still in the particular

OCR for page 30
Page 45 grade. For instance, when reports are published on the achievement of a school system's fourth graders, they represent the cohort currently in fourth grade. When NAEP results are published, they are for some previous year's fourth graders. This again can lead to public confusion over students' academic accomplishments. Supplemental Assessments An appealing feature to state and district officials participating in the workshop was the possibility of having assessment results in subject areas and grades not tested by their state or local programs. Although state and local programs generally test students in reading and mathematics, not all provide assessments of all of the subject areas NAEP assesses, such as writing, science, civics, and foreign languages. Some participants liked the idea of receiving results for twelfth graders, a grade not usually tested by state assessments. Also, NAEP collects background data that many states do not have the resources to collect. Some workshop participants have found the background data to be exceedingly useful and would look forward to receiving reports that would associate district-level performance with background and environmental data. Lack of Program Details Workshop participants were bothered by the lack of specifications about district-level reporting. Even though the committee asked NAEP's sponsors to describe the plans and features of district-level reporting, many of the details have not yet been determined. In responding to questions put to them about district-level reporting, many participating state and district officials formulated their own assumptions and reacted to the program they thought might be enacted. For instance, as mentioned above, they assumed that assessments would be offered in the subject areas and grades available for national NAEP; however, district NAEP has currently only been associated with state NAEP. Hence, only reading, mathematics, writing, and science would be available and only in grades 4 and 8 (not 12). Those that looked forward to receiving data summarized by background characteristics would likely be disappointed given the sample sizes required to obtain such information. Other state and district officials commented that their reactions to the propositions set forth by NAEP's sponsors would depend upon the details.

OCR for page 30
Page 46 Some of their questions included: How much would it cost to participate in district-level NAEP? Who would pay for the costs? How would it be administered—centrally, as with national NAEP, or locally, as with state NAEP? What type of information would be included in the reports? How long would it take to receive results? Would district-level results require the same time lag for reporting as national and state NAEP? The answers to these questions would determine whether or not they would be interested in participating. Of concern to a number of participants, particularly to representatives from the Council of Chief State School Officers, was the issue of small districts. The sampling specifications described at the workshop indicated that districts would need at least 25 schools in a given grade level to receive reports. Technical experts present at the workshop wondered if sufficient thought had been given to the sample size specifications. If the district met the sample size requirements for students (i.e., at least 750 students), the number of schools should not matter. In state and national NAEP, there is considerable variation in average achievement levels across schools, and only a small percentage of schools are sampled and tested. A target of 100 schools was set to be sure that the between-school variation was adequately captured. In district NAEP, there would be fewer schools and less variability between schools. In smaller districts, all schools might be included in the sample, thereby eliminating completely the portion of sampling error associated with between-school differences. Technical experts and others at the workshop encouraged NCES and Westat to pursue sampling specifications and focus on the estimated overall accuracy of results rather than on specifying an arbitrary minimum number of schools based on current procedures for State or National NAEP. Others questioned how “district” might be defined and if district consortia would be allowed. Some participants were familiar with the First in the World consortium, formed by a group of districts in Illinois to participate and receive results from the Third International Mathematics and Science Study. They wondered if such district consortia would be permitted for NAEP. SUGGESTIONS FOR NAEP'S SPONSORS The reporting system that is the subject of this chapter would create a new program with new NAEP products. One of the objectives for convening the committee's workshop on district-level reporting was to learn about

OCR for page 30
Page 47 the factors that would affect states' and districts' interest in this new product. After listening to workshop participants' comments and reviewing the available materials, the committee finds that many of the details regarding district-level reporting have not been thoroughly considered or laid out. District officials, state officials, and other NAEP users—the potential users of the new product—had a difficult time responding to questions about the product's desirability because a clear conception of its characteristics was not available. The most important issues requiring resolution are described below. Clarify the Goals and Objectives The goals and objectives of district-level reporting were not apparent from written materials or from information provided during the workshop. Some workshop participants spoke of using tests for accountability purposes, questioning whether NAEP could be used in this way or not. They discussed the amount of testing in their schools and stressed that new testing would need to be accompanied by new (and better) information. However, some had difficulty identifying what new and better information might result from district-level NAEP data. Their comments might have been different, and perhaps more informative, if they had a clear idea of the purposes and objectives for district-level reporting. An explicit statement is needed that specifies the goals and objectives for district-level reporting and presents a logical argument for how the program is expected to achieve the desired outcomes. Evaluate Costs and Benefits What would districts and states receive? When would they receive the information? How much would it cost? What benefits would be realized from the information? Workshop participants responded to questions about their interests in the program without having answers to these questions, though many said that their interest would depend on the answers. They need information on the types of reports to be prepared along with the associated costs. They need to know about the time lag for reporting. Would reports be received in time to use in their decision and policy making or would the time delays be such as to render the information useless? Costs and benefits must be considered in terms of teachers' and students' time and effort. State systems already extensively test fourth and

OCR for page 30
Page 48 eighth graders. If time is to be taken away from instruction for the purpose of additional testing, the benefits of the testing need to be laid out. Will additional testing amplify the information already provided? Or will the information be redundant to that provided from current tests? Will the redundancy make it useful for external validation? Such information needs to be provided in order for NAEP's sponsors to assess actual levels of interest in the program. Evaluate Participation Levels During the workshop, many spoke of the value of being able to make inter-district comparisons based on districts with like characteristics. However, this use of the results assumes that sufficient numbers of districts will participate. Previous experiences with district-level reporting resulted in a relatively low level of interest: between 10 and 12 interested districts in 1996 and virtually none in 1998. Meaningful comparisons, as defined by demographic, political, and other contextual variables of importance to districts require a variety of other districts with district-level reports. Having only a handful of districts that meet the sampling criteria may limit one of the most fundamental appeals of district-level reporting—that is, carefully selecting others with which to compare results. Thus, if making comparisons is the primary objective for receiving district-level reports, the targeted districts must feel secure in knowing that there are sister districts also completing the necessary procedures for receiving district-level results. The extent of participation will limit the ability to make the desired comparisons. Consider the Impact of Raising the Stakes A concern expressed when state NAEP was first implemented related to the potential for higher stakes to be associated with reporting data for smaller units. The message from several workshop speakers (particularly district representatives) was that district-level reports would raise the stakes associated with NAEP and change the way NAEP results are used. An evaluation should be conducted on the effects of higher stakes, particularly as they relate to the types of inferences that may be made.

OCR for page 30
Page 49 CONCLUSIONS AND RECOMMENDATIONS It was impossible for the committee to gauge actual interest in districtlevel reporting because too little information—such as program objectives, specifications, and costs—was available to potential users. When developing a new product, it is common to seek reactions from potential users to identify design features that will make it more attractive. The reactions of potential users and the responses from product designers tend to produce a series of interactions like “Tell me what the new product is and I will tell you if I like it,” versus “Tell me what you would like the product to be and I will make sure it will have those characteristics.” During the committee's workshop, state and district representatives were put in the position of responding to the latter question. Here, the developer is asking the user to do some of the design work. Often times the user is not knowledgeable enough to give sound design recommendations. Instead, the product designer needs to present concrete prototypes to get credible evaluative reaction. And, the developer should expect several iterations of prototype design and evaluation before the design stabilizes at a compromise between users' needs and what is practically possible. This is the type of process required before ideas and products associated with district-level reporting can progress. RECOMMENDATION 3-2: Market research emphasizing both needs analysis and product analysis is necessary to evaluate the level of interest in district-level reporting. The decision to move ahead with district-level reporting should be based on the results of market research conducted by an independent market-research organization. If market research suggests that there is little or no interest in district-level reporting, NAEP's sponsors should not continue to invest NAEP's limited resources pursuing district-level reporting.