Use of Voluntary National Test Scores for Tracking, Promotion, or Graduation Decisions
The purpose of the proposed voluntary national tests (VNTs) is to inform students, parents, and teachers about the students' performance in 4th grade reading and 8th grade mathematics relative to high national and international standards. The committee takes no position on whether the VNTs are practical or appropriate for this purpose.
The VNT proposal has evolved in many ways since its birth in January 1997, but here we focus on major features of the initial plan. Achievement tests in English reading at the 4th grade level and in mathematics at the 8th grade level would be offered to states, school districts, and localities for administration each spring. The tests would be voluntary, because the federal government would prepare but not require them, nor would data on any individual or school be reported to the federal government. The tests would be distributed and scored through licensed commercial firms. A major effort would be made to include and accommodate students with disabilities and English-language learners in the testing program. The tests would not be long or detailed enough to provide diagnostic information about individual learning problems. They would, however, provide sufficiently reliable information that all students (and their parents and teachers) would know where they stand in relation to high national standards and, in mathematics, in comparison with levels of achievement in other countries. For the 4th grade reading test, the standards would be set by the achievement levels of the corresponding
tests in the National Assessment of Educational Progress (NAEP): basic, proficient, and advanced. For the 8th grade mathematics test, corresponding standards would be set by the 8th grade mathematics tests of NAEP, and student performance would be compared with that in other nations in the Third International Mathematics and Science Study (TIMSS).
In order to provide maximum preparation and feedback to students, parents, and teachers, sample tests would be circulated in advance, and a copy of the original test would be returned with the student's original and the correct answers noted. A major effort would be made to communicate test results clearly to students, parents, and teachers, and all test items would be published on the Internet just after the national administration of each test.
The proposal does not suggest any direct use of VNT scores to make decisions about the tracking, promotion, or graduation of individual students. Representatives of the U.S. Department of Education have stated that the VNTs are not intended for use in making such decisions, and the tests are not being developed to support such uses. Nonetheless, some civil rights organizations and other groups have expressed concern that test users would inappropriately use VNT scores for such purposes. Indeed, under the voluntary testing plan, test users (states, school districts, and schools) would be free to use the tests as they wish, just as test users are now free to use commercial tests for purposes other than those recommended by their developers and publishers. The freedom of test users has been reinforced by the action of the Congress in placing control of the VNT project with the National Assessment Governing Board, the same independent commission that oversees NAEP.
Accordingly, and because this study was mandated in the context of the discussion of the VNTs, the committee has considered whether it would be appropriate to make tracking, promotion, or graduation decisions about individual students based on their VNT scores. The committee recommends that the VNT not be used for decisions about the tracking, promotion, or graduation of individual students. The evidence for this recommendation is elaborated in the following sections.
Use of VNT Scores in Tracking Decisions
The committee foresees several basic problems with using VNT scores to track individual students.
First, to the extent that VNTs could be used for tracking decisions at all, their use would be limited to placement decisions for 5th grade reading and 9th grade mathematics. They would be inappropriate for placement decisions in other subjects or grade levels.
Second, using VNT scores to make future class placements would be valid only to the extent that the VNT assessments were predictive of success in future placements in a particular school. There is no guarantee, however—and little reason to expect, at least initially—that there would be a sufficiently close relationship between VNT scores and available future placements in any particular class or school to justify the use of the scores in making tracking decisions.
Third, VNT proficiency levels, which are expected to be the same as those of NAEP, do not correspond well to other common definitions of proficiency: those embodied in current state content and performance standards (Linn, 1998a), those found in such widely used tests as advanced placement exams and the Scholastic Assessment Test (SAT), 1 and those used in tracking decisions. Indeed, the large proportion of students who score below the "basic" level on NAEP has led to justifiable concerns that reports of achievement on the VNTs will provide little information about lower levels of academic performance.
For example, of all 8th graders who took the 1996 NAEP grade 8 mathematics assessment, roughly 39 percent scored below "basic," and the figures for Mississippi and the District of Columbia were roughly 62 percent and 80 percent, respectively (Linn, 1998a: Figure 14, citing Reese et al., 1997). It is hard to imagine placing that high a proportion of students in low-track 9th grade math classes, particularly in view of the negative consequences that are associated with such placements. The high standards of NAEP, to which the VNT would conform, do not correspond well, if at all, to traditional high, middle, and low tracks. For these reasons, the committee concludes that VNT scores should not be used in making tracking decisions about individual students.
Use of VNT Scores in Promotion and Retention Decisions
Despite efforts to discourage possible high-stakes uses of the VNTs for individual students, the possibility of using them to determine promotion or retention is implicit in President Clinton's proposal: "Good tests will show us who needs help, what changes in teaching to make, and which schools need to improve. They can help us to end social promotion."2 The committee has therefore considered the possible use of voluntary national tests for decisions about promotion.
The committee sees clear incompatibilities between features of the VNTs that would facilitate their use for informing students, parents, and teachers about student achievement and their use to make promotion and retention decisions about individual students.
First, the plan is to report student achievement relative to the high standards of NAEP and to international achievement levels in TIMSS. Use of these national and international standards is an appropriate way to set and communicate higher educational goals. But there is no guarantee that the framework or content of the tests will be aligned with the curriculum that students study. That is, students may have had insufficient opportunity to learn the materials on which they are being tested, and this could render the test inappropriate as a criterion for promotion. Standards used to lead the curriculum may not be compatible with those appropriate for these high-stakes decisions.
Second, the VNT plan focuses on reporting in terms of the proficiency levels of the NAEP tests—basic, proficient, and advanced. The tests must be able to distinguish reliably these levels of performance. Some testing experts have expressed concern about the reasonableness of reporting and interpreting NAEP results in terms of the NAEP proficiency levels.3 The new VNT results, also to be reported by NAEP
proficiency levels, are subject to similar concerns; they are unlikely to be sufficiently trustworthy for making high-stakes decisions about individual students (National Research Council, 1999b). Moreover, decisions about promotion are likely to require high accuracy in distinguishing levels of proficiency other than those identified in NAEP. Indeed, as noted above, the large share of students who score below the basic level in NAEP has led to justifiable concerns that the VNTs might provide little or no information to distinguish different levels of academic performance among these students. This problem is compounded by the likelihood that states and school districts will adopt varying standards and varying cutscores for their high-stakes decisions about students.
Third, it is proposed that all test items be made public through the Internet and returned to students with the correct answers indicated. In high-stakes testing situations, demands for fairness, as well as the committee's criteria of validity and reliability in measurement, require that students who fail be permitted to take the test again. Public release of the items, however, would mean that a test could not be used more than once. This technical problem could be overcome by developing multiple forms of the test, so a student who failed could take an equivalent form of the test later. In fact, the plan is for several equivalent forms of each test to be developed in each year in order to provide comparable test results in the next year. But no extra forms are planned for release or use in "second-chance" administrations. If such extra forms were developed, this would add to problems of test security.
Fourth, there are other ways in which the VNTs could be misused in high-stakes decisions—for example, if performance on a single test administration were the only criterion for promotion. In this respect, however, the VNTs would not differ from any other new or existing test.
The committee finds it most unlikely that the VNTs could serve both the objectives of communicating higher academic standards across the country and of providing a fair and accurate measurement tool for high-stakes decisions about promotion or retention of individual students.
placement of students in one or another of the math classes. Thus, the need to validate cutscores on this test would take on an added importance in light of such a high-stakes use of the test. The consequences of using the NAEP cutscores to make such decisions has been demonstrated by Shepard et al. (1993). See also the forthcoming National Research Council report (1999a) Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress.
The VNTs should therefore not be used in making decisions about the promotion of individual students.
Use of VNT Scores in Graduation Decisions
The committee sees several basic problems in using VNT scores to make graduation decisions about individual students.
First, as noted previously, there is no guarantee—and little reason to expect, at least initially—that the framework or content of these tests would be aligned with the curriculum and instruction that students experience. If VNT content is not representative of what students have been taught, this would render the test inappropriate as a criterion for graduation. In addition, although some states have deemed achievement at the 8th grade level sufficient to meet their graduation standard in mathematics, it is doubtful that there would be any potential use for the results of a 4th grade reading test in determining an individual's fitness to receive a high school diploma.
Second, the VNT plan has focused on reporting in terms of the basic, proficient, and advanced proficiency levels of the NAEP. The questions noted above about the accuracy of these proficiency levels and their usefulness for making high-stakes decisions about individual students apply here as well. Moreover, decisions about graduation are quite likely to require high accuracy at levels of proficiency other than those already identified in NAEP. This could cause problems. To make reliable distinctions between levels of performance that separate proficiency levels, the tests would have to include many items near the levels of difficulty that separate these new proficiency levels. This may not always be feasible. The difficulties involved would be further compounded if states and school districts were to set different proficiency levels for their graduation decisions.
Third, the problems of fairness, validity, and reliability created by making VNT items public through the Internet and returning them to students with correct answers apply here as well. Students should be permitted to retake a graduation test if they have failed on the first administration. Public release of the items implies, however, that the same test could not be used more than once. As noted above, however, no extra forms are planned for release and use in "second-chance" administrations. The lack of alternative forms would make the VNTs inappropriate for use as a graduation test.
Clinton, W.J. 1998 Public Papers of the Presidents of the United States. Washington, DC: Government Printing Office.
Linn, R. 1998a Assessments and Accountability. Paper presented at the annual meeting, American Educational Research Association, San Diego, CA.
1998b Validating inferences from National Assessment of Educational Progress Achievement-Level Reporting. Applied Measurement in Education 11(1):23–47.
Linn, R.L., D. Koretz, E.L. Baker, and L. Burstein 1991 The Validity and Credibility of the Achievement Levels for the 1990 National Assessment of Educational Progress in Mathematics. Los Angeles: University of California, Center for the Study of Evaluation.
National Research Council 1998a Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. J.W. Pellegrino, L.R. Jones, and K.J. Mitchell, eds. Committee on the Evaluation of National and State Assessments of Educational Progress, Board on Testing and Assessment. Washington, DC: National Academy Press.
1998b Uncommon Measures: Equivalence and Linkage Among Educational Tests , M.J. Feuer, P.W. Holland, B.F. Green, M.W. Bertenthal, and F.C. Hemphill, eds. Committee on Equivalency and Linkage of Educational Tests, Board on Testing and Assessment. Washington, DC: National Academy Press.
Reese, C.M., K.E. Miller, J. Mazzeo, and J.A. Dossey 1997 NAEP 1996 Mathematics Report Card for the Nation and the States. Washington, DC: National Center for Education Statistics.
Shepard, L.A. 1995 Implications for standard setting of the National Academy of Education evaluation of the National Assessment of Educational Progress achievement levels. In Proceedings of the Joint Conference on Standard Setting for Large-Scale Assessments of the National Assessment Governing Board and the National Center for Education Statistics (Vol. II, p. 143–160). Washington, DC: National Assessment Governing Board and the National Center for Education Statistics.
Shepard, L., R.G. Glaser, and R. Linn 1993 Setting Performance Standards for Student Achievement: A Report of the National Academy of Education of the NAEP Trial State Assessment: An Evaluation of the 1992 Achievement Levels. Stanford, CA: The National Academy of Education.