The National Assessment Governing Board (NAGB) has made very clear its intention that Voluntary National Tests (VNT) results should be reported using National Assessment of Educational Progress (NAEP) achievement levels. Presumably, this means that each student and his or her parents and teachers would be told whether performance on the test reflects below basic, basic, proficient, or advanced mastery of the reading or mathematics skills outlined in the test frameworks.
More specific discussion of reporting has been largely postponed. NAGB reviewed a “Revised Test Result Reporting Work Plan (American Institutes for Research [AIR], April 23, 1998) at its May 1998 meeting. This plan outlined a number of research steps, from literature review through focus groups, that might be undertaken to identify and resolve reporting issues and problems. The plan did not propose any specific policies or even attempt to enunciate key reporting issues. The schedule called for approval of field test reporting plans by NAGB in August 1999, with decisions on reporting for the operational test in August 2000. In this section, we discuss four key issues in reporting VNT results and describe implications of decisions about these issues for other test development activities.
The charge for phase 1 of our evaluation does not emphasize evaluation of reporting plans and, as indicated above, final decisions on many reporting issues are not yet available for review. Nonetheless, several reporting issues are discussed here that we hope will be addressed in the final plans for reporting. These include:
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 48
--> 6 Reporting The National Assessment Governing Board (NAGB) has made very clear its intention that Voluntary National Tests (VNT) results should be reported using National Assessment of Educational Progress (NAEP) achievement levels. Presumably, this means that each student and his or her parents and teachers would be told whether performance on the test reflects below basic, basic, proficient, or advanced mastery of the reading or mathematics skills outlined in the test frameworks. More specific discussion of reporting has been largely postponed. NAGB reviewed a “Revised Test Result Reporting Work Plan (American Institutes for Research [AIR], April 23, 1998) at its May 1998 meeting. This plan outlined a number of research steps, from literature review through focus groups, that might be undertaken to identify and resolve reporting issues and problems. The plan did not propose any specific policies or even attempt to enunciate key reporting issues. The schedule called for approval of field test reporting plans by NAGB in August 1999, with decisions on reporting for the operational test in August 2000. In this section, we discuss four key issues in reporting VNT results and describe implications of decisions about these issues for other test development activities. Key Issues The charge for phase 1 of our evaluation does not emphasize evaluation of reporting plans and, as indicated above, final decisions on many reporting issues are not yet available for review. Nonetheless, several reporting issues are discussed here that we hope will be addressed in the final plans for reporting. These include: the validity of the achievement-level descriptions, communicating uncertainty in VNT results, computing aggregate results for schools, districts, and states, and providing more complete information on student achievement.
OCR for page 48
--> The Validity of the Achievement-Level Descriptions NAEP's procedures for setting achievement levels and their results have been the focus of considerable review (see Linn et al., 1991; Stufflebeam et al., 1991; U.S. General Accounting Office, 1993; National Academy of Education, 1992, 1993a, 1993b, 1996; National Research Council, 1999a). Collectively, these reviews agree that achievement-level results do not appear to be reasonable relative to numerous other external comparisons, such as course-taking patterns and data from other assessments, on which larger proportions of students perform at high levels. Furthermore, neither the descriptions of expected student competencies nor the exemplar items appear appropriate for describing actual student performance at the achievement levels defined by the cutscores. Evaluators have repeatedly concluded that the knowledge and skills assessed by exemplar items do not match up well with the knowledge and skill expectations put forth in the achievement-level descriptions, nor do the exemplars provide a reasonable view of the range of types of performance expected at a given achievement level. The design of the VNT will expose the achievement-level descriptions to a much higher level of scrutiny than has previously occurred. They will be applied to individual students—not just to plausible values. The classification of students into the achievement levels will be based on a smaller set of items than is used in a NAEP assessment, and all of these items will be released and available for public review. Judgments about the validity of the achievement level descriptions will be based in large part on the degree to which the items used to classify students into achievement levels appropriately match the knowledge and skills covered in the achievement-level descriptions. In Chapter 2 we recommend greater integration of the achievement-level descriptions with the test specifications, and in Chapter 3 we recommend matching the VNT items to the knowledge and skills in these descriptions. Consideration should also be given to ways in which the link between items and the achievement-level descriptions could be made evident in reporting. For example, the description of proficient performance in 4th-grade reading includes “recognizing an author's intent or purpose,” while the description of advanced performance includes “explaining an author's intent, using supporting material from the story/informational text.” Given these descriptions, it would be helpful to provide information to students classified at the proficient level as to how they failed to meet the higher standard of advanced performance. Communicating Uncertainty in VNT Results. Test results are based on responses to a sample of items provided by the student on a particular day. The statistical concept of reliability focuses on how much results would vary over different samples of items or at different times. In reporting aggregate results for schools, states, or the nation, measurement errors are averaged across a large number of students and are not a critical factor in the accuracy of the results. When results are reported for individual students, however, as they will be for the VNT, measurement error is a much more significant issue. The report of the Committee on Equivalency and Linkage (National Research Council, 1999c) describes how the same student could take several parallel versions of the VNT and end up with different, perhaps even quite different, achievement-level classifications. Such possibilities raise two key issues for reporting: How can uncertainty about test scores best be communicated to parents, teachers, and other recipients of test results? How much uncertainty will users be willing and able to tolerate?
OCR for page 48
--> Computing Aggregate Results for Schools, Districts, and States Another issue identified by the Committee on Equivalency and Linkage concerns differences in reporting individual and aggregate results. NAEP uses sophisticated methodology to provide accurate estimates of the proportion of students at each achievement level. These methods involve conditioning on background variables and creating multiple “plausible values” for each student on the basis of their responses to test questions and their background information. (For a more complete explanation of this methodology, see Allen et al., 1998.) We believe that student-level reporting will drive the need for accuracy in VNT results, but tolerance for different levels of accuracy in aggregate results should be explored before final decisions about test accuracy requirements are reached. The VNT contractors have begun to discuss alternatives for reporting aggregate results, ranging from somewhat complex procedures for aggregating probabilities of being at each level for each student through ways of distancing results from the two programs so that conflicts will not be alarming and, perhaps, not even visible. One way of resolving the aggregation issue that has not been extensively discussed would be to generate two scores for each student. The first, called a reporting score, would be the best estimate of each students' performance, calculated either from a tally of correct responses or using an IRT scoring model. The second, called an aggregation score, would be appropriate for estimating aggregate distributions and would be based on the plausible values methodology used for NAEP (see Allen et al., 1998, for a discussion of the plausible values method). Providing More Complete Information on Student Achievement A key question that parents and teachers are likely to have is how close a student is to the next higher (or lower) achievement-level boundary. This question is particularly important for the presumably large proportion of students whose performance will be classified as below the basic level of achievement. Diagnostic information, indicating areas within the test frameworks for which students had or had not achieved targeted levels of proficiency, could serve very useful instructional purposes, pointing to specific areas of knowledge and skill in which students are deficient. The amount of testing time required for providing more detailed information accurately is likely to be prohibitive, however. In addition, the fact that the NAEP and VNT frameworks are designed to be independent of any specific curriculum further limits the instructional value of results from the VNT. Using subcategories or a more continuous scale (such as the NAEP scale) for reporting nearness to an achievement boundary may be much more feasible given current test plans for length and accuracy levels. It might be possible, for example, to report whether students are at the high or low end (or in the middle) of the achievement level in which they are classified. Using such a scale, however, would require acceptance of an even greater level of uncertainty than would be needed for the achievement-level reporting. Conclusions Our key conclusion with regard to reporting is that a clear vision of how results will be reported should drive, rather than follow, other test development activities. If NAEP achievement-level descriptions are used in reporting, the map of test items to specific elements of these descriptions should be made evident. Decisions that affect factors that influence the accuracy of VNT results will also have
OCR for page 48
--> to be made well in advance of the dates proposed for NAGB approval of reporting plans. As described above, decisions about test length, a key determinant of test score accuracy, have already been made without careful consideration of the level of accuracy that can be obtained with the specified test length. Other factors, such as item calibration and equating and linking errors, also influence the accuracy of VNT test results. Methods used by NAEP, including conditioning and plausible values, are not appropriate for reporting individual student results and are not needed for VNT. Without some adjustments, however, VNT results for individual students, when aggregated up to the state level will disagree, in some cases markedly, with NAEP estimates of student proficiency, so that the credibility of both programs will be jeopardized. This will occur even if there are no differences in the levels of student motivation for the VNT and NAEP. No decision has been made about whether and how results will be reported in addition to the achievement levels. It seems likely that students, particularly those in the below basic category, will benefit from additional information, as will their parents and teachers. Recommendations 6-1. NAGB should accelerate its discussion of reporting issues, with specific consideration of the relationship between test items and achievement-level descriptions. Rather than waiting until August 1999, it would be prudent for NAGB and its contractors to determine how achievement-level information will be reported and examine whether items are sufficiently linked to the descriptions of the achievement levels. In addition, attention is needed to the level of accuracy likely to be achieved by the VNT as currently designed and to ways of communicating the corresponding degree of certainty to potential test users. 6-2. NAGB should develop ways of communicating information to users about measurement error and other sources of variation in test results. 6-3. NAGB should develop and review procedures for aggregating student test results prior to approving the field test reporting plan. 6-4. NAGB and AIR should develop and tryout alternative ways of providing supplemental test result information. Policies on reporting beyond achievement-level categories should be set prior to the field test in 2000, with a particular focus on students who are below the basic level of achievement.