EMBEDDING QUESTIONS

The Pursuit of a Common Measure in Uncommon Tests

Daniel M. Koretz, Meryl W. Bertenthal, Bert F. Green, Editors

Board on Testing and Assessment

Committee on Embedding Common Test Items in State and District Assessments

Commission on Behavioral and Social Sciences and Education

National Research Council

NATIONAL ACADEMY PRESS
Washington, DC



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests EMBEDDING QUESTIONS The Pursuit of a Common Measure in Uncommon Tests Daniel M. Koretz, Meryl W. Bertenthal, Bert F. Green, Editors Board on Testing and Assessment Committee on Embedding Common Test Items in State and District Assessments Commission on Behavioral and Social Sciences and Education National Research Council NATIONAL ACADEMY PRESS Washington, DC

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. The study was supported by Contract/Grant No. RJ97184001 between the National Academy of Sciences and the U.S. Department of Education. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the organizations or agencies that provided support for this project. International Standard Book Number 0-309-06789-8 Additional copies of this report are available from: National Academy Press 2101 Constitution Avenue NW Washington, DC 20418 Call 800-624-6242 or 202-334-3313 (in the Washington Metropolitan Area). This report is also available on line at http://www.nap.edu Printed in the United States of America Copyright 1999 by the National Academy of Sciences. All rights reserved. Suggested citation: National Research Council (1999). Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests. Committee on Embedding Common Test Items in State and District Assessments. D.M. Koretz, M.W. Bertenthal, and B.F. Green, eds. Board on Testing and Assessment, Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests THE NATIONAL ACADEMIES National Academy of Sciences National Academy of Engineering Institute of Medicine National Research Council The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Bruce M. Alberts is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. William A. Wulf is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Kenneth I. Shine is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Bruce M. Alberts and Dr. William A. Wulf are chairman and vice chairman, respectively, of the National Research Council.

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests This page in the original is blank.

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests COMMITTEE ON EMBEDDING COMMON TEST ITEMS IN STATE AND DISTRICT ASSESSMENTS DANIEL M. KORETZ (Chair), School of Education, Boston College; RAND Education, Washington, DC SUSAN AGRUSO, Office of Assessment, South Carolina Department of Education RONALD K. HAMBLETON, School of Education, University of Massachusetts, Amherst H.D. HOOVER, Iowa Testing Programs, University of Iowa BRIAN W. JUNKER, Department of Statistics, Carnegie Mellon University JAMES A. WATTS, Southern Regional Educational Board, Atlanta, Georgia KAREN K. WIXSON, School of Education, University of Michigan WENDY M. YEN, CTB/McGraw-Hill, Monterey, California REBECCA ZWICK, Graduate School of Education, University of California, Santa Barbara PAUL W. HOLLAND, Liaison, Board on Testing and Assessment; Graduate School of Education, University of California, Berkeley MERYL W. BERTENTHAL, Study Director BERT F. GREEN, Senior Technical Advisor JOHN J. SHEPHARD, Senior Project Assistant

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests BOARD ON TESTING AND ASSESSMENT ROBERT L. LINN (Chair), School of Education, University of Colorado CARL F. KAESTLE (Vice Chair), Department of Education, Brown University RICHARD C. ATKINSON, President, University of California PAUL J. BLACK, School of Education, King's College, London, England RICHARD P. DURÁN, Graduate School of Education, University of California, Santa Barbara CHRISTOPHER F. EDLEY, JR., Harvard School of Law RONALD FERGUSON, John F. Kennedy School of Public Policy, Harvard University PAUL W. HOLLAND, Graduate School of Education, University of California, Berkeley ROBERT M. HAUSER, Department of Sociology, University of Wisconsin, Madison RICHARD M. JAEGER, School of Education, University of North Carolina, Greensboro LORRAINE MCDONNELL, Departments of Political Science and Education, University of California, Santa Barbara BARBARA MEANS, SRI, International, Menlo Park, California KENNETH PEARLMAN, Lucent Technologies, Inc., Warren, New Jersey ANDREW C. PORTER, Wisconsin Center for Education Research, University of Wisconsin, Madison CATHERINE E. SNOW, Graduate School of Education, Harvard University WILLIAM L. TAYLOR, Attorney at Law, Washington, DC WILLIAM T. TRENT, Associate Chancellor, University of Illinois, Champaign VICKI VANDAVEER, The Vandaveer Group, Inc., Houston, Texas LAURESS L. WISE, Human Resources Research Organization, Alexandria, Virginia KENNETH I. WOLPIN, Department of Economics, University of Pennsylvania MICHAEL J. FEUER, Director VIOLA C. HOREK, Administrative Associate LISA D. ALSTON, Administrative Assistant

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests Acknowledgments The Committee on Embedding Common Test Items in State and District Assessments wishes to thank the many people who helped to make possible the preparation of this report. An important part of the committee's work was to gather data from research, policies, and practices on embedding. Many people gave generously of their time, at meetings and workshops of the committee and in interviews with committee staff. The committee benefited tremendously from a presentation at its first meeting by Achieve, Inc. staff: Matthew Gandal, director of standards and assessment, Jennifer Vranek, senior policy analyst, and consultant David Wiley of Northwestern University. They provided the committee with a comprehensive overview of Achieve's efforts to develop a common national measure of student performance through embedding common items in state mathematics assessments. At a committee workshop, Gordon M. Ambach, executive director of the Council of Chief State School Officers (CCSSO); Wayne H. Martin, director of the CCSSO State Education Assessment Center; and John R. Tanner, director of the Delaware Education Assessment and Analysis Office, offered local, state, and national perspectives on the purposes for which a common measure of student performance might be used. Don McLaughlin, chief scientist at the American Institutes of Research, and Michele Zimowski, senior survey methodologist at the

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests National Opinion Research Center of the University of Chicago, presented on-going research related to linking state mathematics assessments to the National Assessment of Educational Progress (NAEP). Michael Kolen, professor of education at the University of Iowa, discussed the inferences that educators and policy makers want to support with tests that produce individual scores and are linked to NAEP. Patricia Ann Kenney, research associate at the University of Pittsburgh's Learning Research and Development Center, presented her work on the content analysis of NAEP and demonstrated how differences in state content standards and assessments will affect the feasibility of embedding common NAEP items in uncommon tests. John Poggio, director of the Center for Educational Testing and Evaluation and professor of educational psychology and research at the University of Kansas, discussed Kansas' 1992 plan to embed NAEP items in the state testing program and why the plan was subsequently abandoned. Finally, Richard Hill, founder of the National Center for the Improvement of Educational Assessment, Inc., presented his study on the use of embedded NAEP items to estimate the rigor of Louisiana's performance standards relative to NAEP's. The committee is extremely grateful to all of these individuals who helped us clarify our thinking about many of the important issues surrounding our charge. Other individuals provided information to the committee during small group discussions and telephone interviews. We are particularly grateful to Robert J. Mislevy, Educational Testing Service, and Eugene G. Johnson, American Institutes of Research, who gave us information about the NAEP marketbasket; Gage Kingsbury, research director of the Northwest Evaluation Association, who provided information about the NWEA item bank and locally developed tests; and Duncan MacQuarrie, Department of Curriculum and Assessment, Office of the Superintendent of Public Instruction, Washington State Department of Education, who provided us with information from the CCSSO State Collaborative on Assessment and Student Standards. We owe a debt of gratitude to John Olson and Carl Andrews of CCSSO for providing the committee with important data about state testing programs. Without their help, and the help of Wayne Martin, we would not have been able to include the 1997-1998 school year information that is presented throughout this report. We are especially grateful to Bert F. Green, who served as a consultant to the committee and provided invaluable assistance during all phases

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests of the study. He worked tirelessly on our behalf, analyzing the issues, gathering data, and drafting chapters. The timely preparation of this report on an accelerated time schedule could not have happened without his dedication and contributions. The Board on Testing and Assessment, under the leadership of Robert Linn, provided the committee with both guidance and support. We were particularly fortunate to have Paul W. Holland, professor of statistics at University of California at Berkeley and a member of the board, as a liaison member to this committee. As the chair of the Committee on Equivalency and Linkage of Educational Tests, Paul was well acquainted with the issues confronting us and proved to be a valuable guide and sounding board as we pondered the complexities of embedding. We are very grateful to the professional staff of the Commission on Behavioral and Social Sciences and Education, without whose guidance, support, and hard work we could not have completed this report. Barbara B. Torrey, executive director of the commission and Michael J. Feuer, director of the Board on Testing and Assessment (BOTA), created staff support and resources whenever we needed them and provided guidance to us as we navigated through the various stages of completing a National Research Council study in a mere nine months. BOTA staff members Naomi Chudowsky and Karen Mitchell made major contributions to our work, attending committee meetings and discussing ideas with the committee and staff. Karen was particularly gracious in her willingness to read and comment on the many drafts of this report that we endlessly piled on her desk. BOTA staff members Alexandra Beatty and Robert Rothman also read and commented on early drafts of this report; the finished product is better for their efforts. We would be remiss if we didn't also thank two new members of the BOTA staff: Judith Koenig, study director of the Committee on NAEP Reporting Practices, for sharing her library of testing books and journals with us; and Richard Noeth, study director of the Committee on the Evaluation of the Voluntary National Tests, Year 2, for his, guidance, support, and encouragement of our efforts. John Shephard, although new to the Board, served unflappably and flawlessly as the committee's senior project assistant. He dealt smoothly with the logistics of our three committee meetings in four months, with our enormous collections and distributions of materials, and with a seemingly endless stream of text files, e-mail file attachments, and file revi-

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests sions in incompatible word-processing formats. His assistance at critical junctures along the way made the creation of this report possible. John received support when he needed it from other wonderful project assistants: Lisa Alston, Dorothy Majewski, Susan McCutchen, Kim Saldin, and Jane Phillips. Viola Horek, administrative associate to BOTA, was always there, instrumental in seeing that the entire project ran smoothly. We are deeply grateful to Eugenia Grohman, associate director for reports of the Commission on Behavioral and Social Sciences and Education, for her advice on structuring the contents of the report and for her expert editing of the text. Genie knows better than anyone else how to put a report together, from beginning to end. Above all, we thank the committee members for their outstanding contributions to the study. They drafted text, prepared background materials, and helped to organize workshops and committee discussions. Everyone contributed constructive, critical thinking, serious concern about the difficult and complex issues that we faced, and an openmindedness that was essential to the success of the project. This report has been reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise, in accordance with procedures approved by the Report Review Committee of the National Research Council. The purpose of this independent review is to provide candid and critical comments that will assist the institution in making the published report as sound as possible and to ensure that the report meets institutional standards for objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process. We thank the following individuals for their participation in the review of this report: Glenn Crosby, Department of Chemistry, Washington State University; John Guthrie, College of Education, University of Maryland; Lyle V. Jones, L.L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill; Stephen Raudenbush, School of Education, University of Michigan; Henry W. Riecken, Professor of Behavioral Sciences (emeritus), University of Pennsylvania School of Medicine; David Thissen, Graduate Program in Quantitative Psychology, University of North Carolina, Chapel Hill; Ewart A.C. Thomas, Department of Psychology, Stanford University; and Gary Williamson, Division of Accountability Services, North Carolina Department of Instruction, Raleigh.

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests Although the individuals listed above provided constructive comments and suggestions, it must be emphasized that responsibility for the final content of this report rests entirely with the authoring committee and the institution. MERYL W. BERTENTHAL, STUDY DIRECTOR DANIEL M. KORETZ, CHAIR COMMITTEE ON EMBEDDING COMMON TEST ITEMS IN STATE AND DISTRICT ASSESSMENTS

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests This page in the original is blank.

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests Contents     EXECUTIVE SUMMARY   1 1   INTRODUCTION: HISTORY AND CONTEXT   6     Background   6     Committee's Approach   9     Definitions   10     Three Scenarios   12     Broader Issues   12 2   ENVIRONMENT FOR EMBEDDING: TECHNICAL ISSUES   14     Sampling to Construct a Test   14     Common Measures from a Common Test   17     Two Free-Standing Tests   18     Reconciling Scores   19     Threats to Obtaining a Common Measure   20     Standardization of Administration   20     Accommodations   22     Timing of Administration   22     Test Security   27     Stakes   29     Abridgment of Test Content for Embedding   30     Reliability   30     Content Representation   30

OCR for page R1
Embedding Questions: The Pursuit of a Common Measure in Uncommon Tests     Placement of Embedded Test Items   33     Grades and Subjects Tested   33     Context Effects   34     Special Issues Pertaining to NAEP and TIMSS   36 3   THREE DESIGNS FOR EMBEDDING   40     Embedding Without Abridgment of the National Test: The Double-Duty Scenario   41     Design and Features   41     Administration   41     Scoring and Analysis   42     Evaluation   43     Embedding Representative Material: The NAEP-Blocks Scenario   44     Design and Features   44     Administration   46     Scoring and Analysis   46     Evaluation   47     Embedding Unrepresentative Material: The Item-Bank Scenario   49     Design and Features   49     Administration   50     Scoring and Analysis   50     Evaluation   51     Evaluation of the Scenarios   53 4   COMMON MEASURES FOR PURPOSES OTHER THAN INDIVIDUAL SCORES   56     Providing National Scores for Aggregates   57     State Performance Standards   59     Estimating State NAEP Results in Years That State NAEP Is Not Administered   60     Auditing the Results of District and State Assessments   61 5   CONCLUSIONS   62     REFERENCES   67     GLOSSARY   73     BIOGRAPHICAL SKETCHES OF THE COMMITTEE MEMBERS AND STAFF   79