1
Introduction and History

In his 1997 State of the Union address, President Clinton announced a federal initiative to develop tests of 4th-grade reading and 8th-grade mathematics that could be administered on a voluntary basis by states and school districts beginning in spring 1999. The call for Voluntary National Tests (VNT) echoed a similar proposal for "America's Test," which the Bush administration offered in 1990. The principal purpose of the VNT, as articulated by the Secretary of the U.S. Department of Education (see, e.g., Riley, 1997), is to provide parents and teachers with systematic and reliable information about the verbal and quantitative skills that students have achieved at two key points in their school lives. The Department of Education anticipates that this information will serve as a catalyst for continued school improvement, by focusing parental and community attention on achievement and by providing an additional tool to hold school systems accountable for their students' performance in relation to nationwide standards.

The proposed VNT has evolved in many ways since January 1997, but the major features were clear in the initial plan. Achievement tests in English reading at the 4th-grade level and in mathematics at the 8th-grade level would be offered to states, school districts, and localities for administration in the spring of each school year. Several other features of the tests were specified:

  • The tests would be voluntary: the federal government would prepare but not require them, nor would data on any individual, school, or group be reported to the federal government.

  • The tests, each administered in two, 45-minute sessions in a single day, would not be long or detailed enough to provide diagnostic information about individual learning problems. Rather, they would provide reliable information so all students-and their parents and teachers-would know where they are in relation to high national standards. In mathematics results would be linked to scores from the Third International Mathematics and Science Study (TIMSS) to provide comparisons with student performance in other countries.

  • The tests would be designed to facilitate linkage with the National Assessment of Educational



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report 1 Introduction and History In his 1997 State of the Union address, President Clinton announced a federal initiative to develop tests of 4th-grade reading and 8th-grade mathematics that could be administered on a voluntary basis by states and school districts beginning in spring 1999. The call for Voluntary National Tests (VNT) echoed a similar proposal for "America's Test," which the Bush administration offered in 1990. The principal purpose of the VNT, as articulated by the Secretary of the U.S. Department of Education (see, e.g., Riley, 1997), is to provide parents and teachers with systematic and reliable information about the verbal and quantitative skills that students have achieved at two key points in their school lives. The Department of Education anticipates that this information will serve as a catalyst for continued school improvement, by focusing parental and community attention on achievement and by providing an additional tool to hold school systems accountable for their students' performance in relation to nationwide standards. The proposed VNT has evolved in many ways since January 1997, but the major features were clear in the initial plan. Achievement tests in English reading at the 4th-grade level and in mathematics at the 8th-grade level would be offered to states, school districts, and localities for administration in the spring of each school year. Several other features of the tests were specified: The tests would be voluntary: the federal government would prepare but not require them, nor would data on any individual, school, or group be reported to the federal government. The tests, each administered in two, 45-minute sessions in a single day, would not be long or detailed enough to provide diagnostic information about individual learning problems. Rather, they would provide reliable information so all students-and their parents and teachers-would know where they are in relation to high national standards. In mathematics results would be linked to scores from the Third International Mathematics and Science Study (TIMSS) to provide comparisons with student performance in other countries. The tests would be designed to facilitate linkage with the National Assessment of Educational

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report Progress (NAEP) and the reporting of individual test performance in terms of the NAEP achievement levels: basic, proficient, and advanced. In order to provide maximum preparation and feedback to students, parents, and teachers, sample tests would be circulated in advance, copies of the original tests would be returned with the students' original and correct answers, and all test items would be published on the Internet just after the administration of each test. Initial plans for the VNT were laid out by the Department of Education and, in late summer 1997, a contract for test development was awarded to a consortium led by the American Institutes for Research (AIR). The original schedule called for development of test specifications for 4th-grade reading and 8th-grade mathematics tests by fall 1997; pilot testing of test items later that year; field testing of test forms early in 1998; and the first test administration in spring 1999. The department also awarded a contract to the National Research Council (NRC) to conduct an evaluation of VNT test development activities. Subsequent negotiations between the administration and Congress, which culminated in passage of the fiscal 1998 appropriations bill (P.L. 105-78), led to a suspension of test item development (a stop-work order) late in September 1997 and transferred to the National Assessment Governing Board (NAGB, the governing body for NAEP) exclusive authority to oversee the policies, direction, and guidelines for developing the VNT. The law gave NAGB 90 days in which to review the development plan and revise or renegotiate the test development contract. Congress further instructed NAGB to make four determinations about the VNT: the extent to which test items selected for use on the tests are free from racial, cultural, or gender bias; whether the test development process and test items adequately assess student reading and mathematics comprehension in the form most likely to yield accurate information regarding student achievement in reading and mathematics; whether the test development process and test items take into account the needs of disadvantaged, limited English proficient, and disabled students; and whether the test development process takes into account how parents, guardians, and students will be appropriately informed about testing content, purpose, and uses. NAGB negotiated a revised schedule and work plan with AIR. It called for test development over a 3-year period-with pilot testing in March 1999, field testing in March 2000, and operational test administration in March 2001. In addition, the work plan specified a major decision point in fall 1998, which depended on congressional action, and it permitted limited test development activities to proceed through the remainder of the fiscal year, to September 30, 1998. When the Congress assigned NAGB responsibility for the VNT, it also called on the NRC to evaluate the technical adequacy of test materials. Specifically, it asked the NRC to evaluate: the technical quality of any test items for 4th-grade reading and 8th-grade mathematics; the validity, reliability, and adequacy of developed test items; the validity of any developed design which links test results to student performance levels; the degree to which any developed test items provide valid and useful information to the public; whether the test items are free from racial, cultural, or gender bias;

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report whether the test items address the needs of disadvantaged, limited English proficient, and disabled students; and whether the test items can be used for tracking, graduation, or promotion of students. The congressional charges to NAGB and to the NRC were constrained by P.L. 105-78 requirements that "no funds . . . may be used to field test, pilot test, administer or distribute in any way, any national tests" and that the NRC report be delivered by September 1, 1998. The plan for pilot testing in March 1999 required that a large pool of potential VNT items be developed, reviewed, and approved by late fall of 1998, in order to provide time for the construction, publication, and distribution of multiple draft test forms for the pilot test. Given the March 1998 start-up date, NAGB, its prime contractor (AIR), and the subcontractors for reading and mathematics test development (Riverside Publishing and Harcourt Brace Educational Measurement, respectively) faced a daunting and compressed schedule for test design and development. A year after Congress placed restrictions on VNT development, it again considered issues relating to national testing. The Omnibus Consolidated Appropriations Act for fiscal 1999 (which emerged from negotiations between the White House and Congress in fall 1998) contained two related VNT components. The first set of provisions created a new section 447 of the General Education Provisions Act (GEPA), which added an additional restriction regarding the VNT: No funds can be used for pilot testing or field testing of any "federally sponsored national test . . . that is not specifically and explicitly provided for in authorizing legislation enacted into law." (There is currently no explicit authority for individualized national tests.) The second set of provisions included the following requirements: NAGB shall continue to have exclusive authority over the direction and all policies and guidelines for developing voluntary national tests NAGB will report on three important VNT issues: The purpose and intended use of the proposed tests; A definition of the term "voluntary" as it pertains to the administration of the tests; A description of the achievement levels and reporting methods to be used in reporting the test results. NAGB will report on its response to the National Research Council report (1999a) that evaluated NAEP, which repeated the criticism in some earlier evaluations that the process for setting achievement levels was "fundamentally flawed." The National Academy of Sciences (through the NRC) shall conduct a study regarding the technical feasibility, validity, and reliability of including test items from NAEP for 4th-grade reading and 8th-grade mathematics or from other tests in state and district assessments for the purpose of providing a common measure of individual student performance. NAGB developed a work plan for 1999 that includes several important test development activities (based on Guerra, 1998): Detailed specifications describing the content and format of the reading and mathematics tests will be published. A specifications summary will be prepared and distributed.

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report Both specifications versions will include sample items and will be available on the Internet. Efforts will continue to improve the pool of items that have already been written for the reading and math exams. Some additional questions may be written to be sure the proposed tests match NAEP in the range and distribution of item difficulty. Items will be reviewed to make sure they provide the strongest link possible with the NAEP achievement levels. NAGB will conduct an extensive series of focus groups and public hearings for its report on the purpose and use of the VNT and in defining the term "voluntary." Concurrently, NAGB will also deal with questions on how detailed any rules it makes should be and what issues should be left to state and local decision making. NAGB will continue its work on the issues of inclusion and accommodations for students with disabilities and limited English proficiency. All of these reports are due "not later than September 30, 1999." However, NAGB's executive committee recommended that the reports be submitted by June 30, 1999, to provide time for the reports to be considered in the deliberations on the future of the VNT during the upcoming session of Congress. Figure 1-1 shows the timeline for key VNT development and test dates. YEAR 1 EVALUATION To carry out the original congressional mandate for an evaluation of VNT development efforts, the NRC appointed co-principal investigators to be assisted by several NRC staff members. After reviewing item development plans and examining item status and quality, the NRC issued an interim letter report (National Research Council, 1998a). The report expressed concern that the item development and review process was overly compressed, and it offered suggestions for rearranging the review schedule that were subsequently adopted by NAGB. The interim report also suggested the need to match VNT items to the descriptions of the NAEP achievement levels that would be used in reporting results. The complete activities and results of the VNT year 1 evaluation were described in a final report issued on September 30, 1998 (National Research Council, 1999b). As described in that report, the primary focus of the year 1 evaluation was on the technical adequacy and quality of the development, administration, scoring, reporting, and use of the VNT that would aid test developers and policy makers at the federal, state, and local levels. The report covered specifications for the 4th-grade reading and 8th-grade mathematics tests; the development and review of items for the tests; and plans for subsequent test development activities. The last topic included plans for the pilot and field tests, for inclusion and accommodation of students with disabilities and for English-language learners, and for scoring and reporting the tests. The rest of this section summarizes the findings and conclusions of that report. Test Specifications The NRC found that the VNT test specifications were appropriately based on NAEP frameworks and specifications, but incomplete. The close correspondence with NAEP built on NAEP efforts to achieve a consensus on important reading and mathematics knowledge and skills and to maximize the prospects for linking VNT scores to NAEP achievement levels. However, the test specifications lacked information on test difficulty and accuracy targets and were not yet sufficiently tied to the achievement-level descriptions that will be used in reporting. Some potential users also questioned the

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report FIGURE 1.1 VNT development timeline.

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report decision to test only in English. The report recommended that test difficulty and accuracy targets and additional information on the NAEP achievement-level descriptions be added to the test specifications and that NAGB work to build a greater consensus for the test specifications to maximize participation by school districts and states. Test Items Because of significant time pressures, several item review and revision steps in 1998 were conducted simultaneously; as a result, opportunities were missed to incorporate feedback from them. Yet in terms of professional and scientific standards of test construction, the NRC concluded that the development of VNT items to date was satisfactory. NAGB and its consortium of contractors and subcontractors had made good progress toward the goal of developing a VNT item pool of adequate size and of known, high quality. Although it could not be determined whether that goal would be met, the procedures and plans for item development and evaluation were sound. At the same time, NAGB was urged to allow more time for future test development cycles so that the different review activities could be performed sequentially rather than simultaneously. The report also recommended that NAGB and its contractor develop a more automated item-tracking system in order to receive timely information on survival rates and the need for additional items. It said that item development should be tracked by content and format categories and by link to achievement-level descriptions so that shortages of any particular type of item could be quickly identified. Pilot and Field Test Plans The report concluded that the pilot and field test plans appeared generally sound with respect to the number of items and forms to be included and the composition and size of the school and student samples. It also concluded that more detail on plans for data analysis was needed and that some aspects of the design, such as the use of hybrid forms, appeared unnecessarily complex. It recommended that NAGB and its contractor develop more specific plans for the analysis and use of both the pilot and field test data, to include decision rules for item screening and accuracy targets for item parameter estimates, test equating, and linking. The report also recommended that greater justification be supplied for some aspects of the design plans, such as the use of hybrid forms, or that specific complexities be eliminated. NAGB was urged to prepare back-up plans in case item survival rates following the pilot test are significantly lower than anticipated. Inclusion and Accommodation The NRC found that plans for including and accommodating students with disabilities and English-language learners were sketchy and did not break new ground with respect to maximizing the degree of inclusion and the validity of scores for all students. Accommodation issues were not considered as an integral part of item development, and there were no clear plans for assessing the validity of accommodated scores. NAGB was urged to accelerate its plans and schedule for inclusion and accommodation of students with disabilities and limited English proficiency to increase both the participation of those student populations and to increase the comparability of VNT performance among student populations.

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report Reporting Plans There were a number of potential issues in the reporting of test results to parents, students, and teachers that the NRC recommended be resolved as soon as possible, including: the adequacy of VNT items for reporting in relation to the NAEP achievement-level descriptions; mechanisms for communicating uncertainty in the results; and ways to accurately aggregate scores. The report also questioned whether and how additional information might be provided to parents, students, and teachers for students found to be in the "below basic" category. The report recommended that NAGB accelerate its specification of procedures for reporting because reporting goals should drive most other aspects of test development. It said specific consideration should be given to whether and how specific test items will be linked and used to illustrate the achievement-level descriptions. It further recommended that attention be given to how measurement error and other sources of variation will be communicated to users, how scores will be aggregated, and whether information beyond achievement-level categories can be provided, particularly for students below the basic level of achievement. OVERVIEW OF PLANNED YEAR 2 EVALUATION In fall 1998 the Department of Education asked the NRC to continue its evaluation efforts of VNT development during fiscal 1999, with an expanded scope of work given the nature and timing of the test development process. In addition to a continued principal focus on the quality of the items being developed for the VNT, the year 2 committee considered the tests' purpose and questions of how the VNT would be used. Different test uses involve assumptions about inputs and consequences and suggest different technical constraints. The committee sought to identify important technical implications of proposed test use plans and to suggest how test use assumptions might be evaluated. Pertinent to this assessment is NAGB's charge to define the meaning of "voluntary," which affected the committee's analysis and recommendations about test use. We note that the "high stakes" issue in terms of VNT use for decisions about tracking, promotion, or graduation has already been considered in another congressionally mandated study of the VNT, which concluded that the VNT should not be used for such decisions (National Research Council, 1999d). However, we also considered this issue as part of the year 2 evaluation. Scope of Work The scope for year 2 of the VNT evaluation included analysis and comment on four issues, with several parts to three of them: Item Quality Are the items developed for the VNT pilot test valid and informative measures for the test content? Are they free from obvious defects and not biased against ethnic or gender groups? Are the item development and review procedures used by NAGB and its contractor as complete and efficient as possible? Did the cognitive laboratory tryouts significantly improve item quality?

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report Technical Issues in Test Development Would the design for pilot testing result in items that represent the content and achievement-level specifications, are free of bias, and support test form assembly? Are the plans for assembling forms likely to yield highly parallel forms with adequate accuracy in classifying student results by achievement level? Are the revised designs for field testing and equating new forms and linking scores to NAEP achievement levels technically sound? Inclusion and Accommodation Are the plans for including, accommodating, and meaningfully reporting results for students with special language and learning needs adequate and appropriate? VNT Purposes and Practices Are NAGB's proposed rationales for test use clear and compelling? Are the assumptions behind the proposals sound? Are there potential unintended consequences of the proposed uses that should be considered? Do the plans for reporting VNT results make them accessible and meaningful for intended audiences? Are the plans for administration and governance feasible and free from unwarranted assumptions? The committee examined these issues by reviewing and analyzing VNT procedures and products; soliciting expert review and analysis; reviewing the cognitive laboratory materials; collecting and analyzing field data; holding discussions with the relevant constituencies; and holding several information-gathering workshops. The committee's first workshop, in February 1999, focused on test development and test designs. Committee members and other experts met with NAGB and AIR personnel to review and discuss: the extent to which information from content and bias reviews was used to cull and refine items and rubrics; findings from the analysis of results from the cognitive laboratories; implications of the VNT content reviews and achievement level reviews; current plans for pilot testing, field testing, and forms assembly; current plans for test linking and equating; current plans for inclusion and accommodation; and NAGB's December 23, 1998, reports to Congress on methods to be used to explore the purpose and definition of the VNT and plans for responding to the NRC's evaluation of NAEP achievement levels. The second workshop, in April 1999, explored item development issues. Committee members met with other reading and mathematics content experts to review and discuss the extent to which item quality has improved through the original and more recent review and revision activities. The goals of this workshop were to evaluate the degree to which the items measure what the test developers state they are designed to measure, to assess specific problems that have appeared, and to match items to

OCR for page 3
Evaluation of the Voluntary National Tests, Year 2: Final Report achievement levels to ascertain degrees of convergence. The workshop was held in closed session because of the need to review secure test items. Results of this review constitute a major component of this report and are presented in Section 3. The third workshop, held in July 1999, reviewed and discussed: NAGB's June report to Congress on the purposes and intended uses of the VNT and the proposed definition of the term "voluntary;" the likelihood that intended decisions could be supported by the planned reporting metrics; NAGB's June report to Congress on the achievement levels and other reporting plans for the VNT; and plans for accommodating and reporting results for students who are English-language learners or who have special learning needs. Interim Report The committee issued an interim report earlier this year (National Research Council, 1999c) in light of NAGB's June 30, 1999, report to Congress and the White House so that the two reports, collectively, might contribute to the executive and legislative planning and decision making about the current status and potential future of the VNT. This report repeats virtually all of the material in the committee's interim report, with the findings and recommendations updated, modified, deleted, or expanded on the basis of new evidence and information about the continuing development of the VNT. Report Purpose and Organization The purpose of this final report is to provide evaluative information about the VNT, in the form of committee conclusions and recommendations, that will inform congressional and White House debate and discussions about the current status, issues, and future developments regarding the Voluntary National Tests. The remaining sections of this report are organized around the primary issue areas in the committee's charge. Section 2 provides background on the purpose and use of the VNT, and it considers item (7) in our charge, the use of test items for tracking, promotion, or graduation; it also considers broad issues of VNT purposes and practices. Section 3 discusses the quality of items that are now ready for pilot testing and the likelihood that a sufficient number of additional items will be developed to an appropriate level of quality in time for inclusion in a spring 2000 pilot test. It addresses items (1), (2), (4), and (5) in our charge. Section 4 discusses pilot and field test design issues, item (3) in our charge. Section 5 considers issues on the inclusion and accommodation of students in the VNT with special language and learning needs, item (6) in our charge. Section 6 discusses the topic of VNT reporting. Finally, Section 7 presents the committee's overall conclusions and lists all our recommendations.