Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 24
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP 5 Factors That Influence Interest In District-Level NAEP Recent federal initiatives reflect the desires of national policy makers to be able to compare student achievement levels with national benchmarks and to attempt to verify the rigor of state and local standards. President Clinton’s call for the voluntary national tests in reading and mathematics is one example; the tests’ design would strive to create individual measures linked to NAEP to the maximum extent possible, thereby enabling comparisons of individual performance with national benchmarks. Other examples of the desire for comparable test scores are recent congressional requests for studies on the feasibility of developing equivalency scales in order to “link” scores from commercially available standardized tests and state assessments to each other and to NAEP (National Research Council, 1999c) and on the feasibility of embedding common sets of test questions into state and local assessments in order to obtain common measures of individual achievement (National Research Council, 1999b). Thus, it seems clear that the desire for a means of comparing achievement across jurisdictions as well as with national indicators originates within the highest policy-making levels in this country. And while federal policy makers make the decisions regarding such programs, they are not the ones immediately affected. Those most closely affected are students and their families, educators, and administrators at the local and state level. The workshop sought to hear from representatives from state and local assessment offices, the individuals who would be expected to handle such programs.
OCR for page 25
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP Workshop Panel 3 was intended to get at the issues that bear on states ’ and districts’ interest in district-level reporting. Panelists responded to questions posed to them in advance (see Appendix A), and their responses are incorporated into the discussion that follows. As the committee listened to participants interact with each other throughout the two days, it became clear that the questions served as a springboard for further discussion. While some were answered quickly, others stimulated lengthy discussion and were addressed by more than one panel. The text below attempts to capture these discussions and highlight the issues that seemed most important to panelists. WHAT ARE THE GOALS AND OBJECTIVES OF DISTRICT-LEVEL NAEP? A Hammer in Search of a Nail Several participants felt that the proposal for district-level NAEP is like a hammer searching for a nail. They commented that national NAEP and state NAEP are designed with specific goals in mind, and they serve their purposes well. But, as stated by one participant, “one size does not fit all,” and the goals and objectives set for national and state NAEP are not necessarily suitable for district-level reporting. They commented that it was hard to respond to the questions put to them in preparation for the workshop without knowing the sponsors’ and others’ objectives for district-level assessment. Workshop participants maintained that school systems typically use test results to modify and improve instruction. According to Sharon Lewis, representing the Council of Great City Schools: “When schools use assessments to improve the quality of the education offered in their schools, they analyze and use test ... results to change behaviors. They follow a cycle of teaching, testing, modifying instructional practices, developing/purchasing appropriate materials, and then repeat the cycle—teach, test, modify, etc.– hoping to see results.” Several speakers questioned whether NAEP results would fit with these purposes, commenting that decisions based on assessment data are made at the individual, classroom, or school level, not at the district level. Speakers further noted that, in their localities, tests are typically used for accountability purposes and are often associated with high stakes. NAEP, they argued, is not designed as an accountability tool or to yield causal
OCR for page 26
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP inferences regarding achievement, relationships with curricula, or other factors. The frameworks are not necessarily aligned with local curricula, and using NAEP scores to evaluate schools and teaching practices would be neither appropriate nor informative. Using NAEP for the purpose of making high-stakes decisions might also degrade its ability to provide the independent monitoring information it has been designed to provide. When high stakes are attached to test results, motivation to do well increases. Motivation can result in improved teaching practices that lead to actual improvements in skill levels, or motivation can prompt the use of unacceptable test preparation methods that serve to increase test scores without commensurate improvements in the tested knowledge and skills. A clear message from the participants was that their interest in district-level results would rely on details about the program. They encouraged NAEP’s stewards to develop explicit statements of the goals and objectives to be accomplished by district-level results. Providing Information Not Currently Available As noted above, most states currently administer state-developed assessments as well as commercially available tests (Olson et al., in press). Workshop participants told the committee they might welcome additional assessments that serve new and useful purposes, such as allowing comparisons among like districts in other states, as noted earlier. However, they emphasized that a substantial amount of time is currently devoted to testing. Several speakers began their talks by listing the tests currently administered to their students. The remarks of these speakers are presented below to exemplify the extent of testing currently done in the jurisdictions represented at the workshop. According to Judy Costa, testing director for Nevada’s Clark County School District: In the fall, we administer the CTBS/5 or TerraNova to our fourth grade students as well as the TCS/2, which is a test of “school ability, ” in addition to a state-mandated direct writing assessment. In the spring, we administer a series of district-developed curriculum-based criterion-referenced tests in reading, mathematics, and language arts. At the middle school level, the eighth grade schedule is similar to that for fourth grade, although the curriculum-based criterion-referenced tests are still in the process of development and will be piloted this spring and administered in earnest next year.
OCR for page 27
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP At grade 11, we administer state-developed criterion-referenced tests in reading and mathematics, with science and social studies to be added shortly, as well as a direct writing assessment. These tests are taken as part of a certification for graduation process. Eleventh-grade students who do not pass these graduation tests must take them again in twelfth grade, until they finally pass. Unsuccessful students will have up to eight opportunities in eleventh and twelfth grade to pass these tests. In addition to the graduation tests, we administer the CTBS/5 and the TCS/2 to all students in grade 12 and on an optional basis at grade 11. Please notice that additional testing is conducted at other grades, but I have only highlighted the NAEP grade levels. This amount of testing is not unique to Clark County. Students in Chicago take: the Iowa Test of Basic Skills (optional in grades 1 and 2 but required in grades 3 though 8); the Iowa Test of Basic Skills achievement tests in grades 9 and 10; performance assessments in K-2, currently optional at the school level, but close to being required in some areas; the Test of Achievement and Proficiency in high school; the PLAN published by ACT, Inc.; semester exams in grade 11 in English, mathematics, science, and social studies; the Illinois state assessments in reading, mathematics, and writing in third, fifth, and eighth grades, and in science and social studies for grades 4 and 7; and the Prairie State Achievement Test in grade 11. In fact, the Illinois teachers union became sufficiently concerned about the amount of time devoted to testing that they moved to have limits set. Students in Illinois are now limited to a maximum of 25 hours of state-initiated testing during the K-12 years. Local assessment is not subject to the 25-hour limit and is regarded as the most important tool for improving curriculum and instruction. The state assessment program in Georgia is also quite comprehensive. According to Amuleru-Marshall, Atlanta’s program includes a structured assessment in kindergarten; norm-referenced tests in grades 3, 5, and 8; newly developed criterion-referenced tests in grades 4, 6, and 8; and a series of high school graduation tests in language arts, writing, mathematics, science, and social studies for eleventh graders. The school district of Philadelphia is developing an assessment system that includes a national norm-referenced exam (Stanford Achievement Test, Ninth Edition); citywide end-of-course exams in English, mathematics, science, and social studies for grades 7-12; and a K-4 system of curriculum-embedded and on-demand assessments of literacy and mathematics. In addition, the state annually administers reading, mathematics, and writing assessments. Given the extensive amount of testing already occurring in their school
OCR for page 28
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP systems, workshop participants contended that the introduction of new testing would have to be associated with useful, unique, and timely information. District NAEP would need to be designed to meet needs not served by tests already in place in jurisdictions. Several speakers suggested that the introduction of district NAEP might serve to increase participation in state and national NAEP. They maintained that, currently, school districts have little motivation to participate in the national and state programs, since they receive no feedback on their performance. Yet the integrity of NAEP results depends on sufficient and accurate participation at the school level. Providing feedback to school districts may increase interest and raise participation rates in the state and national programs. Remarks by Paula Mosley, coordinator of student testing and evaluation for the Los Angeles Office of Instruction, elucidate this position: District scores would provide an incentive for the students, teachers, and administrative staff involved in the NAEP testing. Currently, it is difficult to get schools to participate because they know there are no [below-state] reports provided. A greater “buy-in” by the stakeholders affected may [occur] if they knew they were representing the district. Schools, administrators, teachers, and students [sacrifice instructional and planning time] to administer NAEP. They should receive feedback for their efforts. Some participants agreed with Mosley, stressing that if they were to advocate for participation in NAEP, their schools and teachers would need to receive something in exchange for their efforts—preferably something not available from current programs. Others were hesitant to agree that simply providing new and unique information would be enough to elicit higher participation rates in state or national NAEP. They claimed that increased participation in a program comes with increased involvement in the program. When state and local officials seek to “win over” teachers and administrators, they search for ways to include educators in activities such as test development and scoring. They find that this type of involvement influences educators’ depth of understanding and motivation to accomplish objectives, asserting that “when teachers are involved in creating the test, they understand what they have created, and they feel ownership of results.” Workshop participants questioned whether NAEP’s stewards would be able to motivate teachers and administrators to buy into NAEP, since they would feel little ownership of the program. They felt that additional reporting feedback would probably not be likely to increase motivation to partici-
OCR for page 29
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP pate. Furthermore, some participants would not consider district NAEP useful without school and student scores that could be clearly linked to curriculum and instruction. Assessments in Additional Subject Areas and Grades Workshop participants were intrigued by the possibility of having assessments in areas they would not normally test. For instance, the speakers from Illinois found the NAEP assessments in foreign language and fine arts to be appealing. Amuleru-Marshall agreed, stating that in Atlanta, content and performance standards are being developed for grades 3, 5, 8, and 10 in language arts, mathematics, science, social studies, fine arts, foreign language, and health and physical education. However, development has slowed due to cost issues, and only the language arts and mathematics assessments have moved forward. NAEP assessments could be used in place of locally developed assessments or until such tests are ready. Amuleru-Marshall also remarked that if NAEP results were available, Atlanta could justify eliminating some of the existing assessments and would also have new data in multiple content areas. Harry Selig, a research manager with the Houston Independent School District, observed that making NAEP assessments available has the potential for allowing districts to “refrain from conducting current norm-referenced testing.” Selig added that using NAEP assessments could reduce their testing costs and lessen the fatigue effects on students due to extensive testing. Speakers noted that the subject areas tested by stateNAEP (e.g., reading, writing, mathematics, and science) are, for the most part, already tested by state assessments. Their desire would be for quality assessments in other areas, such as those tested on national administrations of NAEP. They wondered which assessments would be made available. Speakers had varying opinions about the grade levels covered by the national assessments. National NAEP currently provides assessments in three grades: fourth, eighth, and twelfth; state NAEP offers assessments in fourth and eighth grades. Both assess students biennially. Several speakers mentioned that additional information on twelfth graders would be an appealing feature of district-level NAEP scores. The sparsity of grade levels represented was cited by others as a shortcoming. As noted above, school systems use assessment findings for accountability purposes and to improve teaching practices. Indicators of performance at only three grades would not allow for tracking achievement
OCR for page 30
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP across grades, since off-grade-level assessments would be missing. Nor would testing in one elementary grade, one middle school grade, and one high school grade every two years prove useful. While this cycle of testing serves the purposes of providing national indicators of performance, it would not meet the needs of districts and school systems, according to the workshop participants. Comparisons Over Time Participants expressed interest in the prospect of being able to make comparisons over time based on district-level NAEP data. However, they also recognized that a number of factors might affect the stability of results, making comparisons over time less meaningful. Whereas state boundaries are fixed, school district boundaries change. Schools may be moved from one district to another; new housing developments may alter the characteristics of the student population. With small sample sizes, slight alterations in the composition of a district could have large effects on results. Factors unrelated to student achievement levels, such as changes in inclusion rules for students with disabilities or students with limited English proficiency, or changes in motivation to do well on standardized tests, could produce differences in performance. Comparisons Over Groups Many participants commented about the usefulness of the NAEP background, contextual, and environmental data. They were interested in obtaining this information about their students and alluded to examining score data by population subsets. However, it was not clear whether any districts would have sufficient numbers of test takers to allow this level of reporting. WHO WOULD BE ELIGIBLE TO PARTICIPATE? Proposed Sampling Design for Districts In preparation for the workshop, the National Center for Education Statistics (NCES) and Westat provided two documents as background material on sampling issues that outlined the proposed sampling plans for district-level reporting (Rust, 1999; National Center for Education Statis-
OCR for page 31
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP tics, 1995b). For state NAEP, the sample design involves two-stage stratified samples. Schools are selected at the first stage, and students are selected at the second stage. The typical state sample size is 3,000 sampled students per grade and subject, with 30 students per school. The sample sizes desired for district results would be roughly one-quarter that required for states (750 sampled students at 25 schools to yield 500 participants at 25 schools). This sample size would be expected to produce standard errors for districts that are about twice the size of standard errors for the state. According to the information provided, a district that wishes to report subgroup mean proficiencies for a large number of subgroups —such as race, ethnicity, type of courses taken, home-related variables, instructional variables, and teacher variables—would need sample sizes approximately one-half of its corresponding state sample size, approximately 1,500 students from a minimum of 50 schools. For reporting, the “rule of 62” would apply, meaning that disaggregated results would be reported only for cell sizes with at least 62 students (National Assessment Governing Board, 1995b: Guideline 3). At the workshop, Richard Valliant, associate director of Westat’s Statistical Group, provided additional details on sampling requirements for districts. Valliant described the “sparse state” option, which would require fewer schools but would sample more students at the selected schools. The “small state” option would reduce the number of students per school. Both options still require 500 tested (participating) students. These sample sizes would allow for the reporting of proficiencies (or scaled scores), achievement levels, and percentages of students at or above a given level for the entire district, but would probably not allow for stable estimates of performance for subsets of the sample. Peggy Carr, associate commissioner in the Assessment Division at NCES, described two additional alternatives being considered for future assessments, the “enhanced district sampling plan” and the “analytic approach.” The enhanced district sampling plan would reconfigure the state sampling design so that sufficient numbers of schools were sampled for interested districts. This plan might require oversampling at the district level and the application of appropriate weights to schools, and perhaps districts, during analysis. The analytic approach, according to Carr, would allow districts to access existing data in order to identify districts like themselves and compare results analytically. Carr noted that development of details about this option is still under way.
OCR for page 32
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP Serving Small Districts Workshop participants expressed concern about the sampling requirements. These requirements mean that a district needs at least 25 schools with a given grade level in order to receive reports (e.g., to receive results for eighth graders, the district needs to have at least 25 middle schools). While NCES and the National Assessment Governing Board (NAGB) did not provide an estimate of the number of districts that might qualify, several speakers offered estimates. Lauress Wise, president of the Human Resources Research Organization, distributed a handout showing that about 400 of the 16,000 districts in the country have at least 25 schools. According to Wise’s handout, 170 districts have between 20 and 24 schools with total student populations of size 6,000 or more, and 441 districts have 25 or more schools with total student populations of at least 6,000. Wise noted that his data did not provide breakdowns by grade level. Wayne Martin, director of the State Education Assessment Center of the Council of Chief State School Officers, provided a by-grade estimate for fourth grade. According to Martin’s estimate, approximately 300 school districts would have sufficient numbers of students in the fourth grade to meet the criteria. The proposed sampling criteria prompted comments regarding the intent of district-level reporting. Participants questioned whether the intent was to make district-level results available to alldistricts or only to large urbandistricts. Martin recounted his conversations with state representatives at the recent NAEP State Network meeting: When I asked how they might feel if results were only generated for the large school districts, a number of states suggested that this would create a different set of problems ... [C]harges of favoritism could lead to ... cooperation problems with smaller districts [in state and national NAEP], whereas being singled out could further exacerbate differences between the state agency and large districts. Participants wondered how many districts nationally would meet these requirements and asked about the definition of a “district.” Several questioned whether district consortia would be allowed. In connection with the Third International Mathematics and Science Survey, a group of districts in Illinois formed a consortium in order to participate and receive results. They asked whether such a consortium would be allowed for NAEP. Wise asked if NCES and Westat had thoroughly considered the difference between district and state- and national-level sampling issues in conjunction with the accuracy of results. In state and national NAEP, there is
OCR for page 33
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP considerable variation in average achievement levels across schools, and only a small percentage of schools are sampled and tested. A target of 100 different schools was set to be sure that the between-school variation was adequately captured. In district NAEP, there would be far fewer schools and also less variability between schools. In smaller districts, all schools might be tested, eliminating completely the portion of sampling error associated with between-school differences. Wise advised NCES and Westat to further pursue this issue, focusing on the estimated overall accuracy of results rather than specifying an arbitrary minimum number of schools. Acceptance of Sampling Designs Although they make use of results from national and state NAEP samples, educators and politicians may lack confidence in survey-based results at the district level; they may instead want information based on a full census. NAEP employs complex sampling designs for students and questions. Speakers from Colorado and Illinois, for instance, commented that their legislators may question the legitimacy of test results based on samples. Watson noted that an assessment program in Colorado, designed to employ sampling, was within weeks of being implemented when the state withdrew support. The then-current design of the Colorado State Assessment Program called for assigning schools to one of three content areas being assessed (reading, writing, and geography). All students at the identified grade were to be tested in only that content area. Students and schools were to receive results based on the area in which they were tested. The district was to receive information across all areas, under the assumption that the sampling was sufficient to provide dependable district-level information. These plans were communicated and materials were ready for printing for a March/April administration, when the legislation was changed to eliminate all sampling. Workshop speakers from Illinois added that their testing programs that use samples have been changed. Other participants agreed that NAEP’s designs for sampling students and test questions may be difficult to sell at the local level. HOW MUCH WOULD DISTRICT-LEVEL NAEP COST? Under the 1996 augmentation options described earlier in this report, districts were given the opportunity to augment their state samples to ob-
OCR for page 34
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP tain district-level results. Although a number of districts were initially interested in this plan, nearly all dropped out because of the projected expense. Only Milwaukee participated, and the costs were covered by a National Science Foundation grant. Workshop participants had questions about who was to pay for the costs of participating: Would any of the costs be paid for by the federal government? Were the districts and states to assume responsibility for the costs? Would districts and states be expected to provide staff to handle the administrations? They commented that in order to obtain funding, they would need to convince legislators and policy makers of the potential benefits. Panelists recommended that NAGB and NCES examine the various components of the costs, identify the features associated with higher costs, and consider modifying procedures in order to reduce costs. WHAT PRODUCTS WOULD DISTRICT-LEVEL NAEP GENERATE AND WHEN WOULD THEY BE RELEASED? Characteristics of Reports Questions arose as to the nature of the information that would be provided to states and districts. Would they receive a formal report, like those prepared as part of the existing NAEP program? Would the report contain explanatory information that would help users interpret the results? Participants commented that the types of reports currently provided as part of NAEP are considered both attractive and useful. In contrast, the sample report included in the materials supplied by NCES was simply a computer printout of information (National Center for Education Statistics, 1995b). Some held that it would be difficult to sell participation to policy makers if in exchange for their efforts (and money) they would only receive computer printouts. Others wondered if they would receive electronic data files to use in producing their own reports. They realized that NAEP makes use of complex procedures in order to produce performance estimates (i.e., the conditioning process and plausible values technology). They wondered if they would be expected to implement this technology and produce their own reports. Overall, they felt that a prototype report was needed to exemplify the type of information that would be provided about districts. A prototype report would enable policy makers to make participation decisions based on the type and usefulness of information they would receive.
OCR for page 35
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP Length of Time for Reporting Results Workshop participants also posed questions about the length of time it would take to receive the reports. The time delay currently seen for the release of NAEP results is between 12 and 18 months. For assessments useful in instructional planning and monitoring, school districts are accustomed to receiving test results within six weeks of the administration; in fact, some results are ready within one week of testing (Chicago). The current time lag for receiving reports would seriously degrade the usefulness of NAEP data for districts, and speakers questioned what they would legitimately be able to do with the data. By the time results were received, the students in the grades tested would have moved on to the next grade. What inferences could be made from the results, and how would they be applied? Conditioning and Plausible Values Technology Nancy Allen, director of NAEP analysis and research at the Educational Testing Service, presented an overview of the procedures used to generate group-level results. Allen reminded participants that ability estimates are not computed for individuals, due to the fact that any one student responds to too few items to produce reliable estimates of performance. She described procedures used to generate the likely ability distributions for individuals, based on their background characteristics and responses to cognitive items (the conditioning procedures), and to randomly draw five ability estimates (plausible values) from these distributions. She noted that for state NAEP, the conditioning procedures utilize information on the characteristics of all test takers in the state. Questions arose as to what information would be included in the conditioning models for districts. Would the models be based on the characteristics of the state or the characteristics of the district? To what extent would model misspecification lead to bias in the estimates? Allen responded that the conditioning models rely on information about the relationships between performance on test items and background characteristics. Sometimes the compositional characteristicsof the state and a district will differ, based on background data, but the relationships between cognitive performance and background characteristics may not differ. Nevertheless, Allen stressed that they were still exploring various models for calculating estimates at the district level.
OCR for page 36
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP Participants remarked that it was important to resolve these issues because of the associated expenses and/or time delays in finalizing results that they create. Wise questioned the extent of the conditioning needed. He commented that if district-level reports did not include disaggregated data (due to the rule of 62 for reported cells), the conditioning might not need to include all background variables. WHO WOULD MAKE PARTICIPATION DECISIONS? WHO WOULD OWN THE DATA? Roy Truby, executive director of the National Assessment Governing Board, told participants that when Congress lifted the ban on below-state reporting, it neglected to include language in the law that clarified the roles of states and districts in making participation decisions. In 1998, when NCES offered results to the naturally occurring districts, letters were sent to the districts and their respective states. Based on legal advice from the Department of Education’s Office of General Counsel, state officials would make the decision on release, not the district. In one case, there appeared to be a conflict in which the state wanted the data released, but the district did not. Original policy provided that the district-level results would be made available only with the district’s approval. Upon advice from the Office of General Counsel, the policy was changed to provide that states must give permission for the release of district data from state NAEP samples, but that states should be encouraged to consult with the districts involved before deciding. NAGB members were concerned that the districts were not told when they agreed to participate in 1998 NAEP that scores for their districts might be produced. Because of this ambiguity about decision-making procedures, NAGB passed the following resolution (National Assessment Governing Board, 1999): Since the policy on release of district-level results did not envision a disagreement between state and district officials, the Governing Board hereby suspends implementation of this policy, pending legislation which would provide that the release of district-level NAEP results must be approved by both the district and state involved. In preparation for the workshop, participants had been asked their opinions about which entity, the state or the district, should have the ultimate decision-making authority regarding participation and release of data.
OCR for page 37
REPORTING DISTRICT-LEVEL NAEP DATA: SUMMARY OF A WORKSHOP In general, district representatives believed that the participating entity should make participation decisions, while state representatives believed that the decision should lie with the state. Some added that the entity that paid for participation should have the ultimate decision-making authority. However, the overarching issue related to release of the results. Under the Freedom of Information Act, once results for districts are produced, they are subject to release to the public. Speakers stressed that the issue was not so much about participation as about the fact that once the district had participated, the results would have to be released to the public upon request.
Representative terms from entire chapter: