In his 1997 State of the Union address, President Clinton announced a federal initiative to develop tests of 4th-grade reading and 8th-grade mathematics that could be administered on a voluntary basis by states and school districts beginning in spring 1999. The call for Voluntary National Tests (VNT) echoed a similar proposal for “America's Test,” which the Bush administration offered in 1990. The principal purpose of the VNT, as articulated by the Secretary of the U.S. Department of Education (see, e.g., Riley, 1997), is to provide parents and teachers with systematic and reliable information about the verbal and quantitative skills that students have achieved at two key points in their school lives. The Department of Education anticipates that this information would serve as a catalyst for continued school improvement, by focusing parental and community-wide attention on achievement and by providing an additional tool to hold school systems accountable for their students ' performance in relation to nationwide standards.
The proposed VNT has evolved in many ways since January 1997, but the major features were clear in the initial plan. Achievement tests in English reading at the 4th-grade level and in mathematics at the 8th-grade level would be offered to states, school districts, and localities for administration in the spring of each school year. Several other features of the tests were specified:
The tests would be voluntary: the federal government would prepare but not require them, nor would data on any individual, school, or group be reported to the federal government.
The tests, each administered in two, 45-minute sessions in a single day, would not be long or detailed enough to provide diagnostic information about individual learning problems. Rather, they would provide reliable information so all students—and their parents and teachers—would know where they are in relation to high national standards and, in mathematics, in comparison with levels of achievement in other countries.
The tests would be designed to facilitate linkage with the National Assessment of Educational Progress (NAEP) and the reporting of individual test performance in terms of the NAEP achievement-levels: basic, proficient, and advanced.
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 7
Evaluation of the Voluntary National Tests, Year 2: INTERIM REPORT 1 Introduction and History In his 1997 State of the Union address, President Clinton announced a federal initiative to develop tests of 4th-grade reading and 8th-grade mathematics that could be administered on a voluntary basis by states and school districts beginning in spring 1999. The call for Voluntary National Tests (VNT) echoed a similar proposal for “America's Test,” which the Bush administration offered in 1990. The principal purpose of the VNT, as articulated by the Secretary of the U.S. Department of Education (see, e.g., Riley, 1997), is to provide parents and teachers with systematic and reliable information about the verbal and quantitative skills that students have achieved at two key points in their school lives. The Department of Education anticipates that this information would serve as a catalyst for continued school improvement, by focusing parental and community-wide attention on achievement and by providing an additional tool to hold school systems accountable for their students ' performance in relation to nationwide standards. The proposed VNT has evolved in many ways since January 1997, but the major features were clear in the initial plan. Achievement tests in English reading at the 4th-grade level and in mathematics at the 8th-grade level would be offered to states, school districts, and localities for administration in the spring of each school year. Several other features of the tests were specified: The tests would be voluntary: the federal government would prepare but not require them, nor would data on any individual, school, or group be reported to the federal government. The tests, each administered in two, 45-minute sessions in a single day, would not be long or detailed enough to provide diagnostic information about individual learning problems. Rather, they would provide reliable information so all students—and their parents and teachers—would know where they are in relation to high national standards and, in mathematics, in comparison with levels of achievement in other countries. The tests would be designed to facilitate linkage with the National Assessment of Educational Progress (NAEP) and the reporting of individual test performance in terms of the NAEP achievement-levels: basic, proficient, and advanced.
OCR for page 7
Evaluation of the Voluntary National Tests, Year 2: INTERIM REPORT In order to provide maximum preparation and feedback to students, parents, and teachers, sample tests would be circulated in advance, copies of the original tests would be returned with the students ' original and correct answers, and all test items would be published on the Internet just after the administration of each test. Initial plans for the VNT were laid out by the Department of Education and, in late summer 1997, a contract for test development was awarded to a consortium led by the American Institutes for Research (AIR). The original schedule called for development of test specifications for 4th-grade reading and 8th-grade mathematics tests by fall 1997; pilot testing of test items later that year; field testing of test forms early in 1998; and the first test administration in spring 1999. The department also awarded a contract to the National Research Council (NRC) to conduct an evaluation of VNT test development activities. Subsequent negotiations between the administration and Congress, which culminated in passage of the fiscal 1998 appropriations bill (P.L. 105-78), led to a suspension of test item development (a stop-work order) late in September 1997 and transferred to the National Assessment Governing Board (NAGB, the governing body for NAEP) exclusive authority to oversee the policies, direction, and guidelines for developing the VNT. The law gave NAGB 90 days in which to review the development plan and revise or renegotiate the test development contract. Congress further instructed NAGB to make four determinations about the VNT: the extent to which test items selected for use on the tests are free from racial, cultural, or gender bias; whether the test development process and test items adequately assess student reading and mathematics comprehension in the form most likely to yield accurate information regarding student achievement in reading and mathematics; whether the test development process and test items take into account the needs of disadvantaged, limited-English-proficient, and disabled students; and whether the test development process takes into account how parents, guardians, and students will be appropriately informed about testing content, purpose, and uses. NAGB negotiated a revised schedule and work plan with AIR. It called for test development over a 3-year period—with pilot testing in March 1999, field testing in March 2000, and operational test administration in March 2001. In addition, the work plan specified a major decision point in fall 1998, which depended on congressional action, and it permitted limited test development activities to proceed through the remainder of the fiscal year, to September 30, 1998. When the Congress assigned NAGB responsibility for the VNT, it also called on the NRC to evaluate the technical adequacy of test materials. Specifically, it asked the NRC to evaluate: the technical quality of any test items for 4th-grade reading and 8th-grade mathematics; the validity, reliability, and adequacy of developed test items; the validity of any developed design which links test results to student performance levels; the degree to which any developed test items provide valid and useful information to the public; whether the test items are free from racial, cultural, or gender bias; whether the test items address the needs of disadvantaged, limited-English-proficient, and disabled students; and
OCR for page 7
Evaluation of the Voluntary National Tests, Year 2: INTERIM REPORT whether the test items can be used for tracking, graduation, or promotion of students. The congressional charges to NAGB and to the NRC were constrained by P.L. 105-78 requirements that “no funds . . . may be used to field test, pilot test, administer or distribute in any way, any national tests” and that the NRC report be delivered by September 1, 1998. The plan for pilot testing in March 1999 required that a large pool of potential VNT items be developed, reviewed, and approved by late fall of 1998, in order to provide time for the construction, publication, and distribution of multiple draft test forms for the pilot test. Given the March 1998 startup date, NAGB, its prime contractor (AIR), and the subcontractors for reading and mathematics test development (Riverside Publishing and Harcourt-Brace Educational Measurement, respectively) faced a daunting and compressed schedule for test design and development. A year after Congress placed restrictions on VNT development, it again considered issues relating to national testing. The Omnibus Consolidated Appropriations Act for fiscal 1999 (which emerged from negotiations between the White House and the congressional leadership in fall 1998) created a new section 447 of the General Education Provisions Act (GEPA), which added several additional restrictions regarding the VNT: No funds can be used for pilot testing or field testing of any “federally sponsored national test . . . that it is not specifically and explicitly provided for in authorizing legislation enacted into law.” (There is no explicit authority for individualized national tests.) NAGB is required to report on three important VNT issues: The purpose and intended use of the proposed tests; A definition of the term “voluntary” as it pertains to the administration of the tests; A description of the achievement levels and reporting methods to be used in reporting the test results. The Governing Board is required to report on its response to the National Research Council report (1999b) that evaluated NAEP, which repeated the criticism in some earlier evaluations that the process for setting achievement levels was “fundamentally flawed.” NAGB developed a work plan for 1999 that includes several important test development activities (based on Guerra, 1998): Detailed specifications describing the content and format of the reading and mathematics tests will be published. A specifications summary will be prepared and distributed. Both specifications versions will include sample items and will be placed on the Internet. Efforts will continue to improve the pool of items that have already been written for the reading and math exams. Some additional questions may be written to be sure the proposed tests match NAEP in the range and distribution of item difficulty. Items will be reviewed to make sure they provide the strongest link possible with the NAEP achievement levels. NAGB will conduct an extensive series of focus groups and public hearings in development of its report on VNT purpose and use and in defining the term “voluntary.” Concurrently, NAGB will also deal with questions on how detailed any rules it makes should be and what issues should be left to state and local decision making. NAGB will continue its work on the issues of inclusion and accommodations for students with disabilities and limited-English proficiency.
OCR for page 7
Evaluation of the Voluntary National Tests, Year 2: INTERIM REPORT All of these reports are due “not later than September 30, 1999.” However, NAGB's executive committee recommended that the reports be submitted by June 30, 1999, to provide time for the reports to be considered in the deliberations on the future of the VNT during the upcoming session of Congress. VNT YEAR 1 EVALUATION To carry out the original congressional mandate for an evaluation of VNT development efforts, the NRC appointed co-principal investigators to be assisted by several NRC staff members. After reviewing item development plans and examining item status and quality, the NRC issued an interim letter report (National Research Council, 1998). The report expressed concern that the item development and review process was overly compressed, and it offered suggestions for rearranging the review schedule that were subsequently adopted by NAGB. The interim report also suggested the need to match VNT items to the descriptions of the NAEP achievement levels that would be used in reporting results. The complete activities and results of the VNT year 1 evaluation were described in a final report issued on September 30, 1998 (National Research Council, 1999a). As described in that report, the primary focus of the year 1 evaluation was on the technical adequacy and quality of the development, administration, scoring, reporting, and use of the VNT that would aid test developers and policy makers at the federal, state, and local levels. The report covered specification for the 4th-grade reading and 8th-grade mathematics tests; the development and review of items for the tests; and plans for subsequent test development activities. The last topic included plans for the pilot and field tests, for inclusion and accommodation of students with disabilities and for English-language learners, and for scoring and reporting the tests. The rest of this section summarizes the findings and conclusions of that report. Test Specifications The NRC found that the VNT test specifications were appropriately based on NAEP frameworks and specifications, but incomplete. The close correspondence with NAEP built on NAEP efforts to achieve a consensus on important reading and mathematics knowledge and skills and would maximize the prospects for linking VNT scores to NAEP achievement levels. However, the test specifications lacked information on test difficulty and accuracy targets and were not yet sufficiently tied to the achievement-level descriptions that will be used in reporting. Some potential users also questioned the decision to test only in English. The report recommended that test difficulty and accuracy targets and additional information on the NAEP achievement-level descriptions be added to the test specifications and that NAGB work to build a greater consensus for the test specifications to maximize participation by all school districts and states. Test Items Because of significant time pressures, several item review and revision steps in 1998 were conducted simultaneously, and opportunities were missed to incorporate feedback from them. Yet in terms of professional and scientific standards of test construction, the NRC concluded that the development of VNT items to date was satisfactory. NAGB and its consortium of contractors and subcontractors had made good progress toward the goal of developing a VNT item pool of adequate size and of known, high quality. Although it could not be determined whether that goal would be met, the procedures and
OCR for page 7
Evaluation of the Voluntary National Tests, Year 2: INTERIM REPORT plans for item development and evaluation were sound. At the same time, NAGB was urged to allow more time for future test development cycles so that the different review activities could be performed sequentially rather than simultaneously. The report also recommended that NAGB and its contractor develop a more automated item-tracking system so as to receive timely information on survival rates and the need for additional items. Item development should be tracked by content and format categories and by link to achievement-level descriptions so that shortages of any particular type of item can be quickly identified. Pilot and Field Test Plans The pilot and field test plans appeared generally sound with respect to the number of items and forms to be included and the composition and size of the school and student samples. The report concluded that more detail on plans for data analysis was needed and some aspects of the design, such as the use of hybrid forms, appeared unnecessarily complex. It recommended that NAGB and its contractor develop more specific plans for the analysis and use of both the pilot and field test data. The plans should include decision rules for item screening and accuracy targets for item parameter estimates, test equating, and linking. It also recommended that greater justification be supplied for some aspects of the design plans, such as the use of hybrid forms, or that specific complexities be eliminated. NAGB was urged to prepare back-up plans in case item survival rates following the pilot test are significantly lower than anticipated. Inclusion and Accommodation The NRC found that plans for including and accommodating students with disabilities and English-language learners were sketchy and did not break new ground with respect to maximizing the degree of inclusion and the validity of scores for all students. Accommodation issues were not considered as an integral part of item development, and there were no clear plans for assessing the validity of accommodated scores. NAGB was urged to accelerate its plans and schedule for inclusion and accommodation of students with disabilities and limited English proficiency to increase both the participation of those student populations and to increase the comparability of VNT performance among student populations. Reporting Plans There were a number of potential issues in the reporting of test results to parents, students, and teachers that the NRC recommended be resolved as soon as possible, including: the adequacy of VNT items for reporting in relation to the NAEP achievement-level descriptions; mechanisms for communicating uncertainty in the results; and ways to accurately aggregate scores across students. The report also questioned whether and how additional information might be provided to parents, students, and teachers for students found to be in the “below basic ” category. It recommended that NAGB accelerate its specification of procedures for reporting because reporting goals should drive most other aspects of test development. The report said specific consideration should be given to whether and how specific test items will be linked and used to illustrate the achievement-level descriptions. It further recommended that attention be given to how measurement error and other sources of variation will be communicated to users, how scores will be aggregated, and whether information beyond achievement-level categories can be provided, particularly for students below the basic level of achievement.
OCR for page 7
Evaluation of the Voluntary National Tests, Year 2: INTERIM REPORT OVERVIEW OF PLANNED YEAR 2 EVALUATION In fall 1998 the Department of Education asked the NRC to continue its evaluation efforts of VNT development during fiscal 1999. The NRC's approach to the second year of its VNT evaluation differs from the first year in several ways. First, the work is being conducted by means of a traditional National Research Council committee rather than by co-principal investigators. The committee of ten experts in reading, mathematics, assessment, educational policy, and test use has allowed the NRC to bring a wider range of expertise to the planning and conduct of the evaluation and reduced reliance on outside experts. The use of a study committee is related to the second change in the VNT evaluation, which is an expanded scope. This scope includes a continued principal focus on the quality of the items being developed for the VNT, given the nature and timing of the test development process. In addition to this and the other issues considered in the year 1 report, the year 2 committee is considering NAGB's response to questions of how the VNT would be used. Alternative test uses involve a number of assumptions about inputs and consequences and suggest different technical constraints. The committee is seeking to identify important technical implications of proposed test use plans and to suggest how test use assumptions might be evaluated. Pertinent to this is NAGB's charge to define the meaning of “voluntary,” which affects the committee's commentary and recommendations about test use. The “high stakes” issue in terms of VNT use for decisions about tracking, promotion, or graduation has already been considered in another congressionally mandated study of the VNT, which concluded that the VNT should not be used for such decisions (National Research Council, 1999c). Scope of Work The scope for year 2 of the VNT evaluation includes analysis and comment on four issues, with several parts to three of them: Item Quality Are the items developed for the VNT pilot test valid and informative measures to target test content? Are they free from obvious defects and not biased against ethnic or gender groups? Are the item development and review procedures used by NAGB and its contractor as complete and efficient as possible? Do the cognitive laboratory tryouts significantly improve item quality? Technical Issues in Test Development Will the design for pilot testing result in items that represent the content and achievement-level specifications, are free of bias, and support test form assembly? Are the plans for assembling forms likely to yield highly parallel forms with adequate accuracy in classifying students results by achievement level? Are revised designs for field testing and equating new forms and linking scores to NAEP achievement levels technically sound?
OCR for page 7
Evaluation of the Voluntary National Tests, Year 2: INTERIM REPORT Inclusion and Accommodation Are plans for including, accommodating, and meaningfully reporting results for students with special language and learning needs adequate and appropriate? VNT Purposes and Practices Are the rationales for NAGB proposals for test use clear and compelling? Are the assumptions behind the proposals sound? Are there potential unintended consequences of the proposed uses that should be considered? Do the plans for reporting VNT results make them accessible and meaningful for intended audiences? Are plans for administration and governance feasible and free from unwarranted assumptions? The committee is examining these issues by reviewing and analyzing VNT procedures and products; soliciting expert review and analysis; reviewing the cognitive laboratory materials; collecting and analyzing field data; holding discussions with the relevant constituencies; and holding several information-gathering workshops. The committee's first workshop, in February 1999, focused on test development and test designs. Committee members and other experts met with the NAGB and the AIR personnel to review and discuss: extent to which information from content and bias reviews was used to cull and refine items and rubrics; findings from the analysis of results from the cognitive laboratories; implications of the VNT content reviews and achievement level reviews; current plans for pilot testing, field testing, and forms assembly; current plans for test linking and equating; current plans for inclusion and accommodation; and NAGB's December 23, 1998, reports to Congress on methods to be used to explore the purpose and definition of the VNT and plans for responding to the NRC's evaluation of NAEP achievement levels. The second workshop, in April 1999, explored item development issues. Committee members met with other reading and mathematics content experts to review and discuss the extent to which item quality has improved through the original and more recent review and revision activities. The goals of this workshop were to evaluate the degrees to which the items measure what the test developers state they are designed to measure, to assess specific problems that have appeared, and to match items to achievement levels to ascertain degrees of convergence. The workshop was held in closed session because of the need to review secure test items. Results of this review constitute a major component of this interim report and are presented in Section 2. The third workshop, to be held in July 1999, will review and discuss: NAGB's June report to Congress on the purposes and intended uses of the VNT and the proposed definition of the term “voluntary;” the likelihood that intended decisions could be supported by the planned reporting metrics;
OCR for page 7
Evaluation of the Voluntary National Tests, Year 2: INTERIM REPORT NAGB's June report to Congress on the achievement levels and other reporting plans for the VNT; and plans for accommodating and reporting results for students who are English-language learners or who have special learning needs. Results of this workshop will be included in the committee's final report, which will be released on September 30, 1999. Report Purpose and Organization This VNT year 2 evaluation interim report examines the quality of the VNT items developed to date and reviews plans for piloting and screening these items. Attention is given to the extent to which current item development, piloting, and forms assembly procedures are likely to support the inclusion and meaningful reporting of results for students with disabilities and for English-language learners. While neither item development nor planning for subsequent steps are complete, an interim report is being issued at this time in light of NAGB's June 30, 1999, report to Congress and the White House so that collectively they might contribute to the executive and legislative planning and decision making about the current status and potential future of the VNT. The remaining sections of this report are organized around the first three issue areas in the committee's charge. Section 2 discusses the quality of items that are now ready for pilot testing and the likelihood that a sufficient number of additional items will be developed to an appropriate level of quality in time for inclusion in a spring 2000 pilot test. Section 3 discusses pilot and field test design issues. Section 4 considers issues on the inclusion and accommodation of students in the VNT with special language and learning needs. We anticipate that each of these topics will be covered in greater detail in the committee's final report.