2
Test Specifications.

Specifications for the Voluntary National Tests (VNT) were originally developed in October 1997 by the Council of Chief State School Officers (CCSSO) and MPR Associates, Inc., under contract to the U.S. Department of Education (Council of Chief State Schools Officers and MPR Associates, 1997a; 1997b). Responsibility for developing and approving test specifications was transferred to the National Assessment Governing Board (NAGB) under P.L. 105-78. NAGB established a Test Specifications Committee with separate reading and mathematics subcommittees. This committee commissioned several expert reviews of the test specifications and held public hearings in January 1998 in Washington, D.C., and Chicago, Illinois. Its recommendations for revised specifications were offered to NAGB, which spent much of its March 1998 meeting reviewing and revising those recommendations in a committee of the whole. At the conclusion of its meeting, NAGB approved an outline of the specifications as amended (National Assessment Governing Board, 1998a; 1998b).

NAGB's specifications for VNT say that each test is to be taken in two 45-minute sessions on the same day. About 80 percent of the reading and mathematics items, but only one-half the testing period, is to be administered in a multiple-choice format. The remaining items in each test are to be constructed-response items, some requiring short answers and others requiring extended responses. Eighth-grade students are to be provided with manipulatives (ruler, protractor, and geometric shapes) throughout the examination and with an electronic calculator during the second session of the examination. However, the test is designed so that calculators will be of no use in about one-third of the items in that session, and the remaining items are to be divided between those for which a calculator would be required (labeled “calculator active”) and those for which a calculator would be useful but not essential (labeled “calculator neutral”).

The primary differences between the October 1997 and March 1998 versions of the VNT specifications are in (1) the resemblance between the reading and mathematics specifications and the corresponding National Assessment of Educational Progress (NAEP) frameworks; (2) the level of detail provided in the specifications; (3) the absence of non-English forms of the reading test; (4) the inclusion of intertextual items in reading (questions that ask students to compare or contrast two text



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 12
--> 2 Test Specifications. Specifications for the Voluntary National Tests (VNT) were originally developed in October 1997 by the Council of Chief State School Officers (CCSSO) and MPR Associates, Inc., under contract to the U.S. Department of Education (Council of Chief State Schools Officers and MPR Associates, 1997a; 1997b). Responsibility for developing and approving test specifications was transferred to the National Assessment Governing Board (NAGB) under P.L. 105-78. NAGB established a Test Specifications Committee with separate reading and mathematics subcommittees. This committee commissioned several expert reviews of the test specifications and held public hearings in January 1998 in Washington, D.C., and Chicago, Illinois. Its recommendations for revised specifications were offered to NAGB, which spent much of its March 1998 meeting reviewing and revising those recommendations in a committee of the whole. At the conclusion of its meeting, NAGB approved an outline of the specifications as amended (National Assessment Governing Board, 1998a; 1998b). NAGB's specifications for VNT say that each test is to be taken in two 45-minute sessions on the same day. About 80 percent of the reading and mathematics items, but only one-half the testing period, is to be administered in a multiple-choice format. The remaining items in each test are to be constructed-response items, some requiring short answers and others requiring extended responses. Eighth-grade students are to be provided with manipulatives (ruler, protractor, and geometric shapes) throughout the examination and with an electronic calculator during the second session of the examination. However, the test is designed so that calculators will be of no use in about one-third of the items in that session, and the remaining items are to be divided between those for which a calculator would be required (labeled “calculator active”) and those for which a calculator would be useful but not essential (labeled “calculator neutral”). The primary differences between the October 1997 and March 1998 versions of the VNT specifications are in (1) the resemblance between the reading and mathematics specifications and the corresponding National Assessment of Educational Progress (NAEP) frameworks; (2) the level of detail provided in the specifications; (3) the absence of non-English forms of the reading test; (4) the inclusion of intertextual items in reading (questions that ask students to compare or contrast two text

OCR for page 12
--> passages); and (5) details on calculator use and item response format in mathematics. The March 1998 VNT specifications depart minimally from the corresponding NAEP specifications, mainly as required to yield individual student scores. For example, in 4th-grade reading, the text passages are shorter than in NAEP, and intertextual items may permit the introduction of more questions without the need to add reading passages. Unlike the October specifications, the March specifications exist only in outline form, and item developers have been referred to current NAEP frameworks for more complete information about item content, difficulty, and structure (National Assessment Governing Board, no date[a], no date[b]). Again, the March specifications reflect several important decisions about the VNT: that the reading test would be offered only in English; that it might include some intertextual items; that the mathematics test would be split between sections where calculator use is prohibited and permitted; and that some open-ended mathematics items would require responses that need to be drawn or entered on a grid (which can be machine scored but requires students to supply an answer rather than respond to a list of options). Findings First, our overall finding about the test specifications for VNT is that the test specifications provide a reasonable but incomplete basis for item development. This finding is based on our judgments about (1) the appropriateness of using the NAEP frameworks as the primary basis for the VNT test specifications, (2) the acceptance of VNT specifications by educators and policy makers, and (3) the completeness of the specifications outlines. Appropriateness The test specifications, as amended and approved by NAGB, are nearly identical to the NAEP specifications with respect to the knowledge and skills to be measured. The VNT specifications are also similar to NAEP specifications for item formats, although a slightly different mix is required to support scores for individual students. The similarity with NAEP also extends to other issues, such as the provision of calculators during part of the test. There are a few exceptions, such as the use of intertextual items in reading and the use of gridded and drawn responses in mathematics. Such divergences are likely to be temporary, however, as these innovations may also be introduced into NAEP in the near future. The NAEP test frameworks that are used as the primary basis for the VNT were developed by a careful consensus process, incorporating input from diverse groups and relying heavily on national professional groups outside the federal government. For the most part, the frameworks steer a middle course through special interest groups that seek to emphasize narrower perspectives on reading and mathematics. In reading, the skills that are emphasized range from demonstrating an understanding of individual words through critical evaluation of text. Similarly, the mathematics skills needed range from simple computation through application of advanced problem-solving skills. A key feature of NAEP, which is carried into the VNT specifications, is its lack of alignment with any particular curriculum or instructional approach. This approach avoids infringement on local and state responsibilities for curriculum and pedagogical decisions, but it also limits somewhat the potential uses of the VNT. Specifically, before the VNT is used to assess teachers, schools, or individual students, it will be important to understand the degree of alignment of the instructional program to which students are exposed and the knowledge and skills assessed in NAEP and the VNT.

OCR for page 12
--> Acceptance Second, we observe that there is a lack of broad acceptance of the final test specifications. For example, a number of school districts withdrew from the VNT program when it was decided that the 4th-grade reading test would be in English only. Also, a significant number of educators want greater emphasis on the basic skills (e.g., decoding and computation) that, in theory, must be mastered before students can reach the basic achievement level. Other educators seek greater attention to more complex thinking and reasoning skills (see, e.g., Resnick, 1987). A broader consensus for current decisions may increase eventual participation rates for VNT. (See National Research Council, 1999b, for a discussion of public discourse and its role in testing policy.) Completeness Third, while the VNT specifications provide a reasonable basis for item development, they lack information about statistical targets for items and forms that characterize most test specifications and that are needed for building test forms. In addition, the specifications lack specific information on the NAEP achievement-level descriptions to be used in reporting VNT results and about possible additional reporting scales. Also absent are goals for the assessment of English-language learners and students with disabilities. We note that most test specifications include targets for item difficulty or specify overall test score accuracy targets (or both). While the VNT specifications include targets for length and number of items of various types, they do not specify either the form in which performance will be reported—beyond reference to NAEP achievement levels—or the accuracy with which those reports are to be made at varying levels of achievement. As noted in our interim report (National Research Council, 1998), as of July the contractors had not yet begun to relate VNT items to the descriptions of the NAEP achievement levels that will be used in reporting VNT results (see Appendix E). These descriptions list the knowledge and skills that students must exhibit to be classified at particular levels of achievement. Given NAGB's strong support for achievement-level reporting, we think it is unfortunate that the VNT test specifications do not contain the achievement-level descriptions. The test specifications, available only in outline form, reference the NAEP 1996 mathematics and 1992 reading frameworks for detail on test content. Neither of these documents, however, contains the text of the achievement-level descriptions. We also examined copies of the materials used to train VNT item writers. The text of the achievement-level descriptions was found only in training materials for reading, and there was no mention of the implications of these descriptions for items. Furthermore, the specifications omit discussion of possible additional reporting scales for the VNT. Also absent from the VNT specifications are goals for the inclusion and accommodation of students with disabilities and English-language learners. Briefly, as required by law, students with special needs should be included in the VNT (as in other testing programs) to the maximum extent possible, and the VNT should be designed to yield performance estimates for them—after necessary accommodation—that will be comparable with and as valid and reliable as those for other students. In the case of the VNT, the developers have assumed that adequate provisions for inclusion and accommodation can be introduced, following recent NAEP practice, at a later stage of the development process (see Chapter 5).

OCR for page 12
--> Conclusions By using the NAEP specifications, the VNT can build on important efforts to develop a national consensus on what students should know and be able to do, rather than try to reinvent such a consensus. The use of NAEP frameworks also means that test developers have access to a wide array of released NAEP items as relevant examples. Another benefit of the close resemblance of VNT specifications to NAEP is an increased likelihood of constructing valid linkages between levels of performance on the VNT and the NAEP achievement levels. To be sure, this will require additional work relating the VNT item pool to NAEP achievement levels, as noted in our interim report (National Research Council, 1998), as well as careful statistical work with the pilot and field test forms. Further discussion of this issue appears below and in Uncommon Measures: Equivalence and Linkage Among Educational Tests (National Research Council, 1999c). As we noted above, however, the consensus achieved in the VNT specifications is well short of universal assent, and attention might well be paid to areas of continuing disagreement. There is no consensus about the need for non-English forms or about the appropriate balance of attention to basic and more complex skills in the VNT. In addition, we note that the specifications lack important information about target difficulty levels for items and forms They address the NAEP achievement levels only minimally and lack full information about intended reporting scales for VNT. The specifications also omit sponsors' and developers' goals for inclusion and accommodation for English-language learning students and students with disabilities. Given these omissions, it is difficult to judge the likely accuracy of student classifications in the four groups defined by the NAEP achievement levels—below basic, basic, proficient, and advanced—or whether the classifications will be equally reliable at every achievement level. For example, transforming a VNT test score into the NAEP achievement levels might yield the following type of report: “…among 100 students who performed at the same level as the student, call her Sally, 10 are likely to be in the below basic category, 60 are likely to be basic; 28 are likely to be proficient; and 2 are likely to be in the highest, or advanced category” (National Research Council, 1999c:Ch. 5). We note in Chapter 6 of this report that communication of this type of score information to students, parents, and teachers may combine problems of comprehension with excessive uncertainty. A closely related issue—as yet unresolved by NAGB and absent in the specifications—is the possibility of reporting a scaled test score along with NAEP achievement levels. We note here and in Chapter 6 that if scaled scores are not generated and reported, the VNT will provide little or no information to the large number of students whose performance lies below the basic level. About 40 percent of students nationally and 70 to 80 percent of students in some urban areas score below the basic level on NAEP. Additional information might be especially useful to students whose performance lies below, but close to, the basic level on VNT. Representatives of the Council of the Great City Schools have called for this type of reporting. In the absence of detailed feedback to low-performing students and their parents and teachers, there is likely to be little incentive to participate in the VNT. At the same time, the decision to report a full range of scores would have implications for the distribution of test items by difficulty and thus could affect test accuracy across a range of achievement levels. We do not believe that there is adequate evidence at present about inclusion, accommodation, or comparability in the VNT specifications. (For more detailed discussion of inclusion and accommodation issues, see Chapter 5 and National Research Council, 1999b:Chs. 8, 9.)

OCR for page 12
--> In sum, until issues of student classification, scaled scores, and accommodation are resolved, it is not possible to reach a fully informed judgment about the adequacy of the VNT specifications. Recommendations 2-1. The test specifications should be expanded to take into account developers' objectives for reporting and reliability. 2-2. The developers should work to build a wider consensus for the final test specifications.