About Ordering New Releases Special Offers Questions? Call 888-624-8373

Items in cart [0]

The National Academies Press The National Academies
Evaluation of the Voluntary National Tests -Phase 1

 

Evaluation of the Voluntary
National Tests

PHASE 1

 

Lauress L. Wise, Robert M. Hauser, Karen J. Mitchell, and Michael J. Feuer

 

checkmark

 

Board on Testing and Assessment

Commission on Behavioral and Social Sciences and Education

National Research Council

 

 

NATIONAL ACADEMY PRESS
Washington, D.C. 1998




NOTICE | PROJECT | FOREWORD | ACKNOWLEDGMENTS | CONTENTS | EXECUTIVE SUMMARY


NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The co-principal investigators responsible for the report were chosen for their special competence.

The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Bruce M. Alberts is president of the National Academy of Sciences.

The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. William A. Wulf is president of the National Academy of Engineering.

The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Kenneth I. Shine is president of the Institute of Medicine.

The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Bruce M. Alberts and Dr. William A. Wulf are chairman and vice chairman, respectively, of the National Research Council.

The study was supported by Contract/Grant No. RJ97184001 between the National Academy of Sciences and the U.S. Department of Education. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the organizations or agencies that provided support for this project.

International Standard Book Number 0-309-06277-2

Additional copies of this report are available from:
National Academy Press
2101 Constitution Avenue N.W.
Washington, D.C. 20418
Call 800-624-6242 or 202-334-3313 (in the Washington Metropolitan Area).

This report is also available on line at http://www.nap.edu

Printed in the United States of America

Copyright 1998 by the National Academy of Sciences. All rights reserved.


PROJECT ON THE EVALUATION
OF THE VOLUNTARY NATIONAL TESTS



Co-Principal Investigators

ROBERT M. HAUSER, University of Wisconsin, Madison

LAURESS L. WISE, Human Resources Research Organization, Alexandria, Virginia

 

Staff, Board on Testing and Assessment

MICHAEL J. FEUER, Director

KAREN J. MITCHELL, Senior Program Officer

STEPHEN E. BALDWIN, Senior Program Officer

MARILYN DABADY, Research Associate

DOROTHY MAJEWSKI, Senior Project Assistant




BOARD ON TESTING AND ASSESSMENT

ROBERT L. LINN (Chair), School of Education, University of Colorado, Boulder

CARL F. KAESTLE (Vice Chair), Department of Education, Brown University, Providence, Rhode Island

RICHARD C. ATKINSON, President, University of California

IRALINE BARNES, The Superior Court of the District of Columbia

PAUL J. BLACK, School of Education, King’s College, London, England

RICHARD P. DURÁN, Graduate School of Education, University of California, Santa Barbara

CHRISTOPHER F. EDLEY, JR., Harvard Law School

PAUL W. HOLLAND, Graduate School of Education, University of California, Berkeley

MICHAEL W. KIRST, School of Education, Stanford University

ALAN M. LESGOLD, Learning Research and Development Center, University of Pittsburgh

LORRAINE MCDONNELL, Departments of Political Science and Education, University of California, Santa Barbara

KENNETH PEARLMAN, Lucent Technologies, Inc., Warren, New Jersey

PAUL R. SACKETT, Industrial Relations Center, University of Minnesota, Minneapolis

RICHARD J. SHAVELSON, School of Education, Stanford University

CATHERINE E. SNOW, Graduate School of Education, Harvard University

WILLIAM L. TAYLOR, Attorney at Law, Washington, D.C.

WILLIAM T. TRENT, Associate Chancellor, University of Illinois, Champaign

JACK WHALEN, Xerox Palo Alto Research Center, Palo Alto, CA

KENNETH I. WOLPIN, Department of Economics, University of Pennsylvania, Philadelphia



MICHAEL J. FEUER, Director

VIOLA C. HOREK, Administrative Associate


Foreword




President Clinton’s 1997 proposal to create voluntary national tests in reading and mathematics catapulted testing to the top of the national education agenda. The proposal turned up the volume on what had already been a contentious debate and drew intense scrutiny from a wide range of educators, parents, policy makers, and social scientists. Recognizing the important role science could play in sorting through the passionate and often heated issues in the testing debate, Congress and the Clinton administration asked the National Research Council, through its Board on Testing and Assessment (BOTA), to conduct three fast-track studies over a 10-month period.

This report and its companions—Uncommon Measures: Equivalence and Linkage Among Educational Tests and High Stakes: Testing for Tracking, Promotion, and Graduation—are the result of truly heroic efforts on the part of the BOTA members, the study committee chairs and members, two co-principal investigators, consultants, and staff, who all understood the urgency of the mission and rose to the challenge of a unique and daunting timeline. Michael Feuer, BOTA director, deserves the special thanks of the Board for keeping the effort on track and shepherding the report through the review process. His dedicated effort, long hours, sage advice, and good humor were essential to the success of this effort. Robert Hauser and Lauress Wise deserve our deepest appreciation for their outstanding commitment of time, energy, and intellectual firepower that made this evaluation possible.

These reports are exemplars of the Research Council’s commitment to scientific rigor in the public interest: they provide clear and compelling statements of the underlying issues, cogent answers to nettling questions, and highly readable findings and recommendations. These reports will help illuminate the toughest issues in the ongoing debate over the proposed Voluntary National Tests. But they will do much more as well. The issues addressed in this and the other two reports go well beyond the immediate national testing proposal: they have much to contribute to knowledge about the way tests—all tests—are planned, designed, implemented, reported, and used for a variety of education policy goals.

I know the whole board joins me in expressing our deepest gratitude to the many people who worked so hard on this project. These reports will advance the debate over the role of testing in American education, and I am honored to have participated in this effort.


Robert L. Linn, Chair
Board on Testing and Assessment




Acknowledgments




This project would not have been possible without the generosity of many individuals and the contributions of several institutions.

The sage counsel of Bob Linn and Carl Kaestle, chair and vice chair of the Board on Testing and Assessment (BOTA), helped us frame the evaluation and test our findings and conclusions. Other BOTA members contributed in important ways by participating in briefings and making invaluable suggestions for improved analysis and discussion.

The Office of Planning and Evaluation Services, U.S. Department of Education, administered the contract for this evaluation. Director Allen Ginsburg provided assistance in planning the evaluation, and Audrey Pendleton served as an exemplary contracting office's technical representative during this first phase of the evaluation. We thank them for their guidance and support.

Staff from the National Assessment Governing Board (NAGB), under the leadership of Roy Truby, executive director, and the NAGB prime contractor, the American Institutes for Research (AIR), with Archie LaPointe's guidance, were a valuable source of information and data on the design and development of the Voluntary National Tests (VNT). Sharif Shakrani, Raymond Fields, and Mary Crovo of NAGB and Mark Kutner, Steven Ferrara, John Olsen, Clayton Best, Roger Levine, Terry Salinger, Fran Stancavage, and Christine Paulson of AIR provided us with important information on occasions that are too numerous to mention. We benefited tremendously by attending and learning from discussions at meetings of the National Assessment Governing Board and meetings of its contractors; we thank them for opening their meetings to us and for sharing their knowledge and perspectives. We extend thanks to the staff of the cognitive laboratories and of Harcourt Brace Educational Measurement and Riverside Publishing for access to important information and their perspectives throughout the course of our work.

We relied heavily on the input and advice of a cadre of testing and disciplinary experts, who provided helpful and insightful presentations at our workshops: They are listed in Appendices A-C, and we thank them. Our work was enriched by the stimulating intellectual exchange at the meetings to which they contributed greatly.

William Morrill, Rebecca Adamson, and Donald Wise of Mathtech, Inc., provided important help and perspective throughout. They attended and reported on workshops, cognitive laboratories, and bias review sessions, provided important insight into VNT development, and were valuable members of the evaluation team.

This report has been reviewed by individuals chosen for their diverse perspectives and technical expertise, in accordance with procedures approved by the Report Review Committee of the National Research Council (NRC). The purpose of this independent review is to provide candid and critical comments that will assist the authors and the NRC in making the published report as sound as possible and to ensure that the report meets institutional standards for objectivity, evidence, and responsiveness to the study charge. The content of the review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.

We wish to thank the following individuals, who are neither officials nor employees of the NRC, for their participation in the review of this report: Arthur S. Goldberger, Department of Economics, University of Wisconsin; Lyle V. Jones, L.L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill; Michael J. Kolen, Iowa Testing Programs, University of Iowa; Henry W. Riecken, Professor of Behavioral Sciences (emeritus), University of Pennsylvania School of Medicine; Alan H. Schoenfeld, School of Education, University of California, Berkeley; Richard Shavelson, School of Education, Stanford University; Ross M. Stolzenberg, Department of Sociology, University of Chicago. Although these individuals provided many constructive comments and suggestions, responsibility for the final content of this report rests solely with the authors and the NRC .

Above all, we are grateful to the many individuals at the National Research Council who provided guidance and assistance at many stages of the evaluation and during the preparation of the report. Barbara Torrey, executive director of the Commission on Behavioral and Social Sciences and Education (CBASSE), helped and encouraged our work—and the companion VNT studies—throughout. Sandy Wigdor, director of CBASSE’s Division on Education, Labor, and Human Performance, also has been a source of great encouragement and paved many paths in the conduct of the study. We are indebted, also, to the whole CBASSE staff for indulging our scheduling exigencies. Thanks also to Sally Stanfield and the whole Audubon team at the National Academy Press, for their creative and speedy support.

We are especially grateful to Eugenia Grohman, Associate Director for Reports of CBASSE, for her advice on structuring the content of the report, for her expert editing of the manuscript, for her wise advice on the exposition of the report's main messages, and for her patient and deft guidance of the report through the NRC review process.

We also are immensely grateful to Stephen Baldwin, Patricia Morison, and Naomi Chudowsky of the BOTA staff and Marilyn Dabady, a Yale Ph.D. candidate and BOTA summer intern, who made valuable contributions to our research and report.

We express our gratitude to NRC administrative staff Adrienne Carrington and Lisa Alston. We are especially grateful to Dorothy Majewski and Viola Horek, who capably and admirably managed the operational aspects of the evaluation—arranging meeting and workshop logistics, producing multiple iterations of drafts and report text, and being available to assist with our requests, however large or small.

We recognize the special contributions of Michael Feuer, BOTA director, and Karen Mitchell, senior staff officer, as our coauthors of this report. Michael guided the project, coordinated our work with the companion VNT projects on linkage and appropriate test use, and, most important, made frequent contributions to the discussion and the framing of our questions and conclusions. Karen was a principal source of expertise in both the substance and process of the evaluation, and she provided cheerful and continuous liaison between the two of us and the staff of NAGB and AIR. Without her help, we could not have completed our work in time and to the NRC's rigorous standards.

Lastly, we thank Winnie and Tess for their patience, help, understanding, and good humor during our work on this project. We'll be home for dinner.


Lauress Wise and Robert Hauser,
Co-Principal Investigators
Evaluation of the Voluntary National Tests





Contents




Executive Summary

1   The Proposed Voluntary National Tests and Their Evaluation

2   Test Specifications

3   Item Development and Review

4   VNT Pilot and Field Test Plans

5   Inclusion and Accommodation

6   Reporting Issues

References

Appendices

    A  Workshop on Item and Test Specifications for VNT
    B  Workshop to Review VNT Pilot and Field Test Plans
    C  Workshop on VNT Item Development
    D  Source Documents
    E  Descriptions of Achievement Levels for Basic, Proficient, and Advanced
    F  Revised Schedule for VNT Item Development
    G  Observations of Cognitive Labs and Bias Reviews
    H  Biographical Sketches




Public Law 105—78, enacted November 13, 1997

SEC. 308. STUDY—The National Academy of Sciences shall, not later than September 1, 1998, submit a written report to the Committee on Education and the Workforce of the House of Representatives, the Committee on Labor and Human Resources of the Senate, and the Committees on Appropriations of the House and Senate that evaluates all test items developed or funded by the Department of Education or any other agency of the Federal Government pursuant to contract RJ97153001, any subsequent contract related thereto, or any contract modification by the National Assessment Governing Board pursuant to section 307 of this Act, for—

(1) the technical quality of any test items for 4th grade reading and 8th grade mathematics;

(2) the validity, reliability, and adequacy of developed test items;

(3) the validity of any developed design which links test results to student performance;

(4) the degree to which any developed test items provide valid and useful information to the public;

(5) whether the test items are free from racial, cultural, or gender bias;

(6) whether the test items address the needs of disadvantaged, limited English proficient and disabled students; and

(7) whether the test items can be used for tracking, graduation or promotion of students.




Executive Summary




In his 1997 State of the Union address, President Clinton announced a federal initiative to develop tests of 4th-grade reading and 8th-grade mathematics that would provide reliable information about student performance at two key points in their educational careers. According to the U.S. Department of Education, the Voluntary National Tests (VNTs) would create a catalyst for continued school improvement by focusing parental and community-wide attention on achievement and would become new tools to hold school systems accountable for their students’ performance. The National Assessment Governing Board (NAGB) has responsibility for development of the VNT.

The tests would be voluntary because the federal government would prepare but not require them, and no individual, school, or group data would be reported to the federal government. Every effort would be made to include and accommodate students with disabilities and English-language learners in the testing program. The tests would provide sufficiently reliable information so all students—and their parents and teachers—would know where they stood in relation to high national standards and, in mathematics, also in relation to levels of achievement in other countries.

In order to provide maximum preparation and feedback to students, parents, and teachers, sample tests would be circulated in advance, and copies of the original tests would be returned with the original and correct answers marked. A major effort would be made to communicate test results clearly to students, parents, and teachers, and all test items would be released on the Internet just after each test administration.

Congress recognized that a testing program of the scale and magnitude of the VNT initiative raises many important technical questions and requires quality control throughout development and implementation. In P.L. 105-78, Congress called on the National Research Council (NRC) to evaluate a series of technical issues pertaining to the validity of test items, the validity of proposed links between the VNT and the National Assessment of Educational Progress (NAEP), plans for the accommodation and inclusion of students with disabilities and English-language learners, plans for reporting test information to parents and the public, and potential uses of the tests. (Congress also requested two additional studies, one on the linkage and equivalency of tests and the other on appropriate test use.)

In accepting this charge, the National Research Council appointed us co-principal investigators. Working closely with NRC staff and consultants, under the auspices and oversight of the NRC’s Board on Testing and Assessment, we have solicited a wide range of expert advice, conducted a number of data-gathering and analytical activities, and held three public workshops.

This report covers phase 1 of the evaluation (November 1997-July 1998) and focuses on three principal issues: test specifications and frameworks; preliminary evidence of the quality of test items; and plans for the pilot and field test studies, for inclusion and accommodation, and for reporting VNT results.


TEST SPECIFICATIONS

The VNT test specifications are appropriately based on NAEP frameworks and specifications, but they are incomplete. The close correspondence with NAEP builds on NAEP efforts to achieve a consensus on important reading and mathematical knowledge and skills and maximizes the prospects for linking VNT scores to NAEP achievement-levels. However, the current test specifications lack information on test difficulty and accuracy targets and they are not yet sufficiently tied to the achievement-level descriptions that will be used in reporting. Some potential users also question the decision to test only in English.

We recommend that test difficulty and accuracy targets and additional information on the NAEP achievement-level descriptions be added to the test specifications. We also recommend that NAGB work to build a greater consensus for the test specifications to maximize participation by all school districts and states.


TEST ITEMS

Because of significant time pressures, several item review and revision steps have been conducted simultantously, and opportunities have been missed to incorporate feedback from individual steps. Yet relative to professional and scientific standards of test construction, the development of VNT items to date has been satisfactory, especially in light of the significant time pressures. The National Assessment Governing Board (NAGB) and its consortium of contractors and subcontractors have made good progress toward the goal of developing a VNT item pool of adequate size and of known, high quality. While we cannot determine in advance whether that goal will be met, we find that the procedures and plans for item development and evaluation are sound. The hurried pace also prevented full development of an item tracking system.

The VNT test design presented some novel problems for which there are no ready solutions. For example, the compressed schedule did not permit the fundamental development work that would be required to ensure both inclusion and comparable validity of test scores for students who are English-language learners and students with disabilities.

In addition, the design of the tests and of their results has continued to evolve during the development process. For example, while the goal of reporting in terms of achievement-levels has remained constant, there has as yet been no decision about the possibility of reporting scaled scores or ranges of scores as well. Indeed, some features of test design, such as test length, appear to have been determined administratively, ignoring possible implications for the validity or reliability of the test.

We recommend that NAGB allow more time for future test development cycles so that the different review activities can be performed sequentially rather than in parallel. We also recommend that NAGB and its contractor develop a more automated item tracking system so as to have timely information on survival rates and the need for additional items. Item development should be tracked by content and format categories and by link to achievement-level descriptions so that shortages of any particular type of item can be quickly identified.


PILOT AND FIELD TEST PLANS

The pilot and field test plans appear generally sound with respect to the number of items and forms to be included and the composition and size of the samples. More detail on plans for data analysis is needed and some aspects of the design, such as the use of hybrid forms, appear unnecessarily complex.

We recommend that NAGB and its contractor develop more specific plans for the analysis and use of both the pilot and field test data. These plans should include decision rules for item screening and accuracy targets for item parameter estimates, test equating, and linking. We also recommend that greater justification be supplied for some aspects of these plans, such as the use of hybrid forms, or that specific complexities be eliminated. NAGB should also prepare back-up plans in case item survival rates following the pilot test are significantly lower than anticipated.


INCLUSION AND ACCOMMODATION

Plans for including and accommodating students with disabilities and English-language learners are sketchy and do not yet break new ground with respect to maximizing the degree of inclusion and the validity of scores for all students.

We recommend that NAGB accelerate its plans and schedule for inclusion and accommodation of students with disabilities and limited English proficiency in order to increase both the participation of those student populations and to increase the comparability of VNT performance among student populations.


REPORTING PLANS

There are a number of potential issues in the reporting of test results to parents, students, and teachers that should be resolved as soon as possible, including: the adequacy of VNT items for reporting in relation to the NAEP achievement-level descriptions; mechanisms for communicating uncertainty in the results; and ways to accurately aggregate scores across students. We also question whether and how additional information might be provided to parents, students, and teachers for students found to be in the below basic category.

We recommend that NAGB accelerate its specification of procedures for reporting because reporting goals should drive most other aspects of test development. Specific consideration should be given to whether and how specific test items will be linked and used to illustrate the achievement-level descriptions. Attention should also be given to how measurement error and other sources of variation will be communicated to users, how scores will be aggregated, and whether information beyond achievement-level categories can be provided, particularly for students below the basic level of achievement.




REPORT HOME PAGE | NAP HOME PAGE