In this final chapter, we recap the committee's recommendations about specific aspects of the VNT development effort discussed in detail in Chapters 2 through 6 above. These cover: test purpose and use (Chapter 2); item quality and readiness (Chapter 3); technical issues in test development (Chapter 4); inclusion and accommodation (Chapter 5); and reporting (Chapter 6).
The number of specific recommendations listed here may leave the impression that the committee is dissatisfied with the progress and pace of VNT development. As this is not necessarily the case, we end our report with two overarching conclusions about VNT development and a recommendation to Congress to consider as it decides the ultimate fate of the program.
The National Assessment Governing Board, in its report to Congress on VNT, has specified the purpose and intended use of the test, the meaning of "voluntary," and several other key elements:
a focus on individual performance in reading in the 4th grade and mathematics in the 8th grade;
an effort to link VNT content, standards, and reporting to the National Assessment of Educational Progress;
extensive feedback of test results to individual students;
voluntary participation by states or local or private school authorities; and
a clearly defined prohibition of federal participation in the VNT program, beyond its support of test development and, possibly, operational costs.
The committee believes that a search for evidence that the VNT, if implemented, would have favorable effects on academic achievement should be a high priority and makes four recommendations regarding VNT purpose and use.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 76
Evaluation of the Voluntary National Tests, Year 2: Final Report 7 Conclusions and Recommendations In this final chapter, we recap the committee's recommendations about specific aspects of the VNT development effort discussed in detail in Chapters 2 through 6 above. These cover: test purpose and use (Chapter 2); item quality and readiness (Chapter 3); technical issues in test development (Chapter 4); inclusion and accommodation (Chapter 5); and reporting (Chapter 6). The number of specific recommendations listed here may leave the impression that the committee is dissatisfied with the progress and pace of VNT development. As this is not necessarily the case, we end our report with two overarching conclusions about VNT development and a recommendation to Congress to consider as it decides the ultimate fate of the program. TEST PURPOSE AND USE The National Assessment Governing Board, in its report to Congress on VNT, has specified the purpose and intended use of the test, the meaning of "voluntary," and several other key elements: a focus on individual performance in reading in the 4th grade and mathematics in the 8th grade; an effort to link VNT content, standards, and reporting to the National Assessment of Educational Progress; extensive feedback of test results to individual students; voluntary participation by states or local or private school authorities; and a clearly defined prohibition of federal participation in the VNT program, beyond its support of test development and, possibly, operational costs. The committee believes that a search for evidence that the VNT, if implemented, would have favorable effects on academic achievement should be a high priority and makes four recommendations regarding VNT purpose and use.
OCR for page 76
Evaluation of the Voluntary National Tests, Year 2: Final Report RECOMMENDATION 2.1 High priority should be given to the articulation of potential educational effects of the VNT and to the development of a program of research and evaluation, to determine whether and how the VNT contributes to improved educational outcomes. RECOMMENDATION 2.2 The National Assessment Governing Board should develop explicit and detailed guidelines, practices, and enforcement mechanisms for the appropriate use of the Voluntary National Tests relative to high-stakes decisions about individual students or about teachers, classrooms, schools, or other educational units. Those guidelines should illustrate uses of the VNT relative to high-stakes decisions that are inappropriate and explicitly state the potential consequences of such inappropriate uses. RECOMMENDATION 2.3 The National Assessment Governing Board should develop explicit and detailed guidelines, practices, and enforcement mechanisms for the appropriate compilation and use of aggregate data from administrations of the Voluntary National Tests relative to high-stakes decisions about teachers, classrooms, schools, or other educational units. RECOMMENDATION 2.4 The National Assessment Governing Board should continue to develop plans for how the VNT would operate. Specifically, it should develop proposals for operational delivery systems for the VNT and for funding ongoing development and delivery costs so that potential users can make decisions about their participation, based on the costs as well as the potential educational value of the VNT. ITEM QUALITY AND READINESS The committee examined the extent to which the VNT items are likely to provide useful information to parents, teachers, students, and others about whether students have mastered the knowledge and skills specified for basic, proficient, or advanced performance in 4th-grade reading or 8th-grade mathematics. The evaluation of the VNT items involved three key questions: Are the completed items judged to be as good as they can be prior to the collection and analysis of pilot test data? Are they likely to provide valid and reliable information for parents and teachers about students' reading or math skills? Does it seem likely that a sufficient number of additional items will be completed to a similar level of quality in time for inclusion in a spring 2000 pilot test? On the basis of data from the committee's item quality rating panels and other information provided to the committee by NAGB and its contractor, the committee reached several conclusions about current VNT item quality and about the item development process: The number of items at each stage is not always known, and relatively few items and passages have been through the development and review process and fully approved for use in the pilot test.
OCR for page 76
Evaluation of the Voluntary National Tests, Year 2: Final Report The quality of the completed items is as good as a comparison sample of released NAEP items. Item quality is significantly improved in comparison with the items reviewed in preliminary stages of development a year ago. For about half of the completed items, our experts had suggestions for minor edits, but the completed items are ready for pilot testing. Efforts by NAGB and its contractor to match VNT items to NAEP achievement-level descriptions have been helpful in ensuring a reasonable distribution of item difficulty for the pilot test item pool, but they have not yet begun to address the need to ensure a match of item content to the descriptions of performance at each achievement level. Our efforts to match item content to the achievement-level descriptions led to more concern with the achievement-level descriptions than with item content. The current descriptions do not provide a clear picture of performance expectations within each reading stance or mathematics content strand. The descriptions also imply a hierarchy among skills that does not appear reasonable to the committee. Given these conclusions, the committee offers six recommendations. RECOMMENDATION 3.1 NAGB should require regular item development status reports that indicate the number of items at each stage in the review process by content and format categories. For reading, NAGB should also require counts at the passage level that indicate the status of passage reviews and the completeness of all of the associated items. RECOMMENDATION 3.2 The rates at which each of the different item types survives each stage from initial content reviews through analyses of pilot test data should be computed. This information should be used in setting targets for future item development. RECOMMENDATION 3.3 Item quality concerns identified by reviewers, such as distractor quality and other "minor edits," should be carefully addressed and resolved by NAGB and its contractor prior to inclusion of any items in pilot testing. RECOMMENDATION 3.4 The contractor should continue to refine the achievement-level matching process to include the alignment of item content to achievement-level descriptions, as well as the alignment of item difficulty to the achievement-level cutpoints. RECOMMENDATION 3.5 The achievement-level descriptions should be reviewed for usefulness in describing specific knowledge and skill expectations to teachers, parents, and others with responsibility for interpreting test scores and promoting student achievement. RECOMMENDATION 3.6 Test blueprints should be expanded to indicate the expected number of items at each achievement level for each content area (reading stance or mathematics content strand) for each form of the test. Insofar as possible, items at each achievement level should be included for each content area. TECHNICAL ISSUES IN TEST DEVELOPMENT Our year 2 evaluation of technical issues in test development focused on the extent to which the design for pilot testing will result in items that represent the content and achievement-level specifica-
OCR for page 76
Evaluation of the Voluntary National Tests, Year 2: Final Report tions, are free of bias, and support test form assembly; plans for the implementation of VNT pilot testing; plans for assembling field test forms likely to yield valid achievement-level results; and the technical adequacy of revised designs for field testing, equating, and linking. NAGB and its contractor have made progress in developing detailed plans for score reporting, the design and analysis of pilot test data to screen items for inclusion in VNT forms, and on the difficult issues associated with the field test. Based on information available to date, we offer six recommendations. RECOMMENDATION 4.1 Pilot test plans should include school clusters, overlapping (hybrid) forms design, and NAEP anchor forms, as currently planned. In addition, the contractor should select the calibration procedure that is best suited to the final data collection design and in accord with software limitations and should plan to conduct item-fit analyses. RECOMMENDATION 4.2 Information regarding expected item survival rates from pilot to field test should be stated explicitly, and NAGB should consider pilot testing additional constructed-response items, given the likelihood of greater rates of problems with these types of items than with multiple-choice items. RECOMMENDATION 4.3 NAGB and its contractor should continue to detail plans for analyzing the pilot test data. Additional specifications should be provided for assessing the extent to which each item fits the model being used for calibration and the ways in which differential item functioning analyses results will be used in making decisions about the items. RECOMMENDATION 4.4 A target test information function should be decided on and set. Although accuracy at all levels is important, accuracy at the lower boundaries of the basic and proficient levels appears most critical. Equivalent accuracy at the lower boundary of the advanced level may not be feasible with the current mix of items, and it may not be desirable because the resulting test would be too difficult for most students. RECOMMENDATION 4.5 NAGB should consider plans for developing an alternate form of the VNT targeted to students at the low end of the achievement scale. RECOMMENDATION 4.6 Plans for the VNT pilot test should include efforts to gather empirical data on the effects of content, administration, and use differences between the VNT and NAEP on the feasibility of linking VNT scores to the NAEP score scale. Specifically, a NAEP-like form (e.g., two non-overlapping booklets from recent 4th-grade reading and 8th-grade mathematics assessments) should be included to allow for an assessment of the effect of content differences and administration differences on the linkage of VNT scores to the NAEP scale. INCLUSION AND ACCOMMODATION There are two key challenges to testing students with disabilities or limited English proficiency. The first is to establish effective procedures for identifying and screening such students so they can
OCR for page 76
Evaluation of the Voluntary National Tests, Year 2: Final Report appropriately be included in assessment programs. The second is to identify and provide necessary accommodations to students with special needs while maintaining comparable test validity with that for the general population. The committee applauds AIR's proposal to evaluate the effects of two common accommodations on VNT performance (i.e., extended time and small-group administrations), among students with disabilities and with limited English proficiency in the pilot test. In addition, proposed research on extra time and small-group or one-on-one administration in conjunction with the pilot test have been approved. However, no specific recommendations or actions appear to have been made or taken on the basis of the hearings on inclusion and accommodation, and parent and teacher focus groups did not specifically address these issues. While language simplification methodology has been used in the test development process, little attention has been paid to other language issues, aside from dual-language booklets, regarding the VNT mathematics test (e.g., whether translation of existing questions into Spanish and other language versions will produce comparable items or whether methods can be used to reduce the reading level of mathematics items). Participation in the cognitive laboratories by students with disabilities and with limited English proficiency has been expanded. On the basis of the work done so far, the committee offers five recommendations. RECOMMENDATION 5.1 NAGB should accelerate its plans, research, and schedule for inclusion and accommodation of students with disabilities and limited English proficiency in order to increase the participation of both those student populations in numbers representative of their numbers in the student population. RECOMMENDATION 5.2 NAGB should consider expanding the accommodation research planned in conjunction with the pilot test to include a systematic analysis of the use and effect of dual-language booklets. Additional accommodations for English-language learners, in the forms of both a Spanish-only translation of the mathematics test and the use of English-Spanish and English-other languages dictionaries for the mathematics test, should also be considered for the pilot test. RECOMMENDATION 5.3 NAGB should clarify the reading constructs (e.g., reading proficiency, reading proficiency in English, etc.) being measured by the 4th-grade reading test prior to the field test and then address what accommodations would not invalidate assessment of these constructs. In particular, NAGB should clarify when reading competency could be assessed in a student's primary or native language if it is not English. RECOMMENDATION 5.4 The National Assessment Governing Board should assess the effects of various accommodations for limited-English-proficient students and students with disabilities at both the item and total test score levels. To do so will require oversampling in the pilot and field tests. RECOMMENDATION 5.5 The National Assessment Governing Board should provide a clear, concise, and detailed list of accommodations for the VNT for students with disabilities or limited English proficiency for use on the VNT field test.
OCR for page 76
Evaluation of the Voluntary National Tests, Year 2: Final Report REPORTING One of the primary recommendations of the NRC year 1 report was that decisions about how scores will be computed and reported should be made before the design of the VNT test forms can be fully evaluated. NAGB and AIR are developing and evaluating options for test use and have conducted focus groups that have included consideration of reporting options, but no further decisions about score computation and reporting have been made. The committee believes that a number of decisions and steps related to VNT reporting need to be made soon. RECOMMENDATION 6.1 Given that test items and answer sheets will be provided to students, parents, and teachers, as well as made available to the general public, test forms should be designed to support scoring using a straightforward, total correct, raw score approach. RECOMMENDATION 6.2 Special attention should be given to the work required for receiving partial credit for constructed-response items that have full scores of more than 1 point. RECOMMENDATION 6.3 Achievement-level reporting should be supplemented with reporting using a standardized numeric scale. Confidence bands on this scale should be used to communicate measurement error. RECOMMENDATION 6.4 Individual student performance on the VNT should not be reported at the subscore level. RECOMMENDATION 6.5 NAGB and its contractor should undertake research on alternative ways for providing item-level feedback to students, parents, and teachers. The options explored should include provision of information on item content and targeted achievement level, as well as normative information, such as passing rates. RECOMMENDATION 6.6 NAGB and its contractor should consider including students, particularly at the 8th-grade level, as well as parents and teachers in future focus groups on score reporting. RECOMMENDATION 6.7 NAGB should support aggregation of test results for participating districts and states, while discouraging inappropriate, high-stakes uses of aggregated results. NAGB should develop explicit and detailed guidelines and practices for the appropriate compilation and use of aggregate data from administration of the VNT and should explain limitations on the validity of comparisons of aggregate results on the VNT to results from NAEP. SUMMARY CONCLUSIONS AND RECOMMENDATION Congress must answer the overarching policy question about the VNT: whether development should continue or be terminated. The committee does not take a position either for or against continued development. Our primary charge was to evaluate the technical quality of the VNT test
OCR for page 76
Evaluation of the Voluntary National Tests, Year 2: Final Report items and forms, and our recommendations above offer a number of specific ways to improve the VNT items and the development process more generally. Lest these recommendations be misinterpreted as a condemnation of the quality of VNT development, however, we stress that there is no evidence that the current process should be halted on technical grounds and we offer the following summary conclusion: CONCLUSION VNT development is generally on course. A large number of items have been written, and the quality of the items that came through the contractor's development and review process is comparable to the quality of items from NAEP. Plans for pilot testing these items are generally sound. Rather than either approving or terminating the VNT, Congress could elect to postpone a final decision until a more considered debate of value and costs is completed. If an overall decision is deferred, Congress must also decide whether to allow or continue to prohibit the collection of pilot test data. In this context, we offer our second general conclusion: CONCLUSION The planned pilot test of VNT test items presents opportunities for research on a number of important test development topics that will be useful to NAEP and state and local assessment programs even if the VNT is eventually terminated. These research opportunities include: (1) assessing the quality and effectiveness of the VNT's item development, review, and revisions processes; (2) collecting empirical data on the impact of different threats to "linkability"; and (3) assessing the feasibility, effects, and validity of alternative testing accommodations for students with disabilities or limited English proficiency. In addition, the items themselves are likely to be useful for other testing programs. This second conclusion is not meant to imply that the pilot test is a good idea if a decision has already been reached to terminate the program. Rather, we suggest that, if a decision to terminate the program is deferred pending further consideration of its potential value, there could be value to going ahead with the pilot test in parallel with reaching a final conclusion about the VNT itself. As stated by NAGB in its report on the VNT's purpose and use report, scores from the pilot test would not be reported back to students, parents, or teachers so the risk to students is minimal. The key question would be whether the costs, including the time of teachers and students, justify the potential benefits. The intended value of the VNT is to improve the achievement of American students at key points in their educational careers. However, the consequential chain from reporting the achievement of individual students relative to challenging national standards through parent involvement, student behavior, and teacher practices to improved achievement has not been carefully delineated. And questions of who will pay ongoing development and administration costs have not been answered. We began this report with a discussion of the proposed purpose and use of the VNT. NAGB has articulated a viable purpose for the VNT, but Congress alone can assess the "value" that would result if the VNT is implemented as described in NAGB's purpose and use document. The committee does not recommend either continuation or termination, but we do offer a recommendation to Congress in making the decision:
OCR for page 76
Evaluation of the Voluntary National Tests, Year 2: Final Report RECOMMENDATION TO CONGRESS The decision to continue or terminate the VNT should be based on a carefully articulated statement of the expected value and costs of the program, including a detailed examination of underlying assumptions and a delineation of possible unintended outcomes. To the maximum extent possible, research on results from other educational reform efforts should be considered to support or contradict assumptions in this value-and-cost statement. Information on the likelihood of use by states, districts, and individuals should also be considered in making a decision about the VNT.