NATIONAL RESEARCH COUNCIL
COMMISSION ON BEHAVIORAL AND SOCIAL SCIENCES AND EDUCATION
2101 Constitution Avenue
Washington, DC 20418

BOARD ON TESTING AND ASSESSMENT
202-334-3087
202-334-3584 FAX
Michael J. Feuer, Ph.D., Director


July 15, 1998


Secretary Richard W. Riley
U.S. Department of Education
Room 6263
600 Independence Avenue, SW
Washington, DC 20202

Dear Mr. Secretary:

As you know, the National Research Council (NRC) has been conducting an independent evaluation of certain technical aspects of the development of items for the proposed Voluntary National Tests of fourth-grade reading and eighth-grade mathematics. To carry out this mandate, specified in P.L. 105-78 (November 1997), the NRC appointed us as co-principal investigators. We are working with NRC staff under the auspices and oversight of the NRC's Board on Testing and Assessment (BOTA) and soliciting input from a wide range of outside experts. Please note that we interpret our mandate as a request for technical review only, and we take no position on the overall merits of the Voluntary National Tests.

Under the NRC contract with the Department of Education, we are scheduled to issue a report of our first-year findings in September 1998. However, we have identified an issue that we believe can benefit from immediate attention, and we are sending this letter in the hopes of ensuring the best possible test development process. Our principal concern is with the timetable for item review and revision, which, if relaxed somewhat, would allow more time for the full benefit of those activities to be realized. This letter is based on our evaluation of the materials available to us as of early June 1998: we realize that because the development process is ongoing, some of the issues identified in this letter may have already been addressed by the time it is released. In the interest of providing constructive feedback, however, we choose to express our concerns now even if this proves to have been unnecessary, rather than delay until September and risk being too late to effect positive change.

Our letter is organized in three main sections: a brief discussion of the scope and purpose of the evaluation; a description of the workshop that focused on item quality; and our findings, conclusions, and recommendations. We trust this information will be helpful and interpreted in the spirit we intend, as a suggestion that more time be allowed for item review and revision.

Scope and Purpose


In his January 1997 State of the Union address, President Clinton called for the development of Voluntary National Tests (VNT) in fourth-grade reading and eighth-grade mathematics as a means of providing information on the academic progress of American youth in relation to national and international benchmarks. Congress gave the National Assessment Governing Board (NAGB) exclusive authority to develop the VNT and asked the NRC to provide an independent evaluation of the development activities.

In this first phase of our evaluation (which is scheduled to culminate in a report in September 1998), we have focused largely on three aspects of the test development process and products: issues surrounding the test specifications and the NAEP frameworks; plans for the development and implementation of the pilot study scheduled for spring 1999; and preliminary evidence of the quality of possible test items. To date, we have observed laboratory-based talk-aloud item try-out sessions with students, reviewed design and development plans and reports, examined draft test items and scoring materials, and conducted three workshops in which additional experts with a wide range of skills and experience have contributed unique and invaluable input. This letter pertains exclusively to our concerns about the item review and revision schedule for the next few months; it is based on our most recent workshop (June 2-3, 1998).

VNT Item Development and Review Schedule


It is important to note that NAGB and its contractors for test development have not completed the development of candidate items for the VNT. Because of a 6-month delay in initiation of the project, NAGB, its prime contractor (the American Institutes for Research), and the subcontractors for reading and mathematics test development have faced a daunting and compressed schedule for test design and development. (Although the schedule for development, review, revision, and approval of materials has not been rigidly set, approximate dates have been specified.) Revised test specifications were approved by NAGB at its March meeting and item development continued thereafter. The subcontractors for reading and mathematics development relied on the NAEP frameworks, the VNT specifications, released NAEP items, and bias and sensitivity review guidelines in writing items. Each subcontractor appears to be following the development and quality control procedures used in its proprietary testing programs. After initial drafting, items then undergo review and revision.

The item review and revision process includes a set of sequential and overlapping steps:

1. initial review of items by the subcontractors for reading and mathematics development;
2. content review by the prime contractor and its consultants;
3. review by outside content experts;
4. trial evaluation of a subset of items in one-on-one talk-aloud sessions with students (called cognitive laboratories);
5. to the extent possible, provision of recommendations for item revision to item writers based upon summaries of information obtained from steps 1-4;
6. review of items for bias and sensitivity by consultants to the contractors;
7. revision of items by the item writers;
8. review of test items by NAGB; and
9. NAGB sign-off on test items.

Steps 1-5 above were to be completed by June 30th, so that revisions to at least a subset of the approximately 2,600 items were possible prior to bias review that was scheduled to take place July 6-8. This bias review and the previously provided review information was to be the basis for additional item revision prior to NAGB's review and approval. NAGB is scheduled to review items in three waves, one beginning July 15, the next beginning on July 22, and the final set beginning on July 29. Approval of items by the Governing Board will be sought at the November 1998 NAGB meeting. Approved items then will be assembled in draft test forms for the proposed pilot administration of the VNT in spring 1999. It is unclear what activities NAGB plans to undertake for its review between July 15 and the November meeting.

Workshop on VNT Item Development


Our concern with the scheduling of item review and revision activities results primarily from data obtained in our third workshop. On June 2 and 3, a group of outside experts with experience in the development and evaluation of conventional and performance-based test materials (see the list of outside experts at Attachment 1) examined and rated a subset of secure test items in their area of expertise. The test developers supplied a list of items that represented the VNT specifications for content coverage and item formats. We selected a total of 60 mathematics items from a pool of 120 items provided by the prime contractor. Similarly, we selected 6 reading passages with a total of 45 items from a pool of 12 reading passages with roughly 120 questions. The sample of items we selected matched the VNT specifications for the length of a test form as well as for content coverage and item formats.

These materials and the workshop participants' observations provided the data that led to this report. Because the items we examined in June were products of the initial stages of item development, we did not expect them to reflect the complete development process. Many items had been through content review, and a number were being tried out with small numbers of students to assess item clarity and accuracy. However, the items had not yet been reviewed for bias or sensitivity, revised by test writers, or submitted for NAGB review and approval.

Before examining the test materials, the experts, co-principal investigators, and NRC staff signed nondisclosure statements promising to protect the confidentiality of the materials. Consequently, specific illustrations of our findings cannot be provided without breaching the security of these materials. When analysis for the broader evaluation is complete, however, we and the NRC staff will be happy to share specific examples with NAGB members and staff who already have access to the secure test materials.

At the workshop, the experts independently identified the knowledge and skills likely to be measured by each item and attempted to match these to the content and skill outlines for the VNT. They also appraised item quality and identified ambiguities that might lead students to invalid responses (correct or incorrect). After the item rating exercise, the principal investigators and staff met jointly with the experts, NAGB, and the developers of the VNT to discuss issues of item quality and coverage and to discuss plans for item review and quality assurance. The principal investigators and staff also met separately with the experts for further discussion of the item materials.

Findings, Conclusions, and Recommendations


Our findings, conclusions, and recommendations are based on our own evaluation of the information provided at the workshop, subsequent discussions, and the process and products of item development. Although we benefited greatly from the views of the experts with whom we worked, we stress that this report is solely the responsibility of the authors and the NRC. We also reiterate that these findings apply only to the materials available to us by early June.


[NAP Home] [Report Home Page]

Copyright © 1998 by the National Academy of Sciences. All rights reserved.