Origin of the Market-Basket Concept
This chapter traces the evolution of the NAEP market-basket concept. The first part of the chapter briefly describes NAEP's design between 1969 and 1996, providing foundation for material that appears later in the report. Discussion of a NAEP market basket began with the redesign effort in 1996 (National Assessment Governing Board, 1996). The second part of the chapter explores aspects of the 1996 redesign that relate to the market basket. The final section of the chapter discusses NAGB's most recent proposal for redesigning NAEP, focusing on the redesign objectives that pertain to the market basket (National Assessment Governing Board, 1999b).
NAEP'S DESIGN: 1969-1996
During the 1960s, the nation's desire grew for data that could serve as independent indicators of the educational progress of American children. With the support of the U.S. Congress, NAEP was developed and first administered in 1969 to provide a national measure of students' performance in various academic domains.
In the first decade of NAEP's administration, certain political and social realities guided the reporting of results. For example, at the time, there was strong resistance on the part of federal, state, and local policymakers to any type of federal testing, to suggestions that there should be a national curriculum, and to comparisons of test results across states (Beaton and
Zwick, 1992). To assuage these policymakers' concerns, NAEP results were reported in aggregate for the nation as a whole and only for specific test items, not in relation to broad knowledge or skill domains. In addition, to defuse any notion of a national curriculum, NAEP was administered to 9-, 13-, and 17-year-olds, rather than to students at specific grade levels.
In the early 1980s, the educational landscape in the United States began to change and, with it, the design of NAEP. The nation experienced a dramatic increase in the racial and ethnic diversity of its school-age population, a heightened commitment to educational opportunity for all, and increasing involvement by the federal government in monitoring and financially supporting the learning needs of disadvantaged students (National Research Council, 1999b). These factors served to increase the desire for assessment data that would help gauge the quality of the nation's education system. Accordingly, in 1984, NAEP was redesigned. Redesign, at this time, included changes in sampling methodology, objective setting, itemdevelopment, data collection, and analysis. Sampling was expanded to allow reporting on the basis of grade levels (fourth, eighth, and twelfth grades) as well as age.
Administration and sponsorship of NAEP has evolved over the years. Congress set the general parameters for the assessment and, in 1988, created the National Assessment Governing Board (NAGB) to formulate policy guidelines for NAEP (Beaton and Zwick, 1992). NAGB is an independent body comprising governors, chief state school officers, other educational policymakers, teachers, and members of the general public. The Commissioner of the National Center for Education Statistics (NCES) directs NAEP 's administration. NCES staff put into operation the policy guidelines adopted by NAGB and manage cooperative agreements with agencies that assist in the administration of NAEP. On a contractual basis, scoring, analysis, and reporting are handled by ETS, and sampling and field operations are handled by Westat.
Over time, as policy concerns about educational opportunity, the nation's work force needs, and school effectiveness heightened, NAGB added structural elements to NAEP's basic design and changed certain of its features. By 1996, there were two components of NAEP, trend NAEP and main NAEP.
Trend NAEP consists of a collection of test questions in reading, writing, mathematics, and science that have been administered every few years (since the first administration in 1969) to 9-, 13-, and 17-year-olds. The purpose of trend NAEP is to track changes in education performance over
time, and thus, changes to the collection of test items are kept to a minimum.
Main NAEP includes questions that reflect current thinking about what students know and can do in certain subject areas. The content and skill outlines for these subject areas are updated as needed. Main NAEP encompasses two components: state NAEP and national NAEP. State and national NAEP use the same large-scale assessment materials to assess students' knowledge in the core subjects of reading, writing, mathematics, and science. National NAEP is broader in scope, covering subjects not assessed by state NAEP, such as geography, civics, U.S. history, world history, the arts, and foreign languages. National NAEP assesses fourth, eighth, and twelfth graders, while state NAEP includes only fourth and eighth graders.
NAEP's mechanisms for reporting achievement results have evolved over the years, but since 1996, two methods have been used: scale scores and achievement levels. Scale scores ranging from 0 to 500 summarize student performance in a given subject area for the nation as a whole and for subsets of the population based on demographic and background characteristics. Results are tabulated over time to provide trend information. Academic performance is also summarized using three achievement-level categories based on policy definitions established by NAGB: basic, proficient, and advanced. NAEP publications report the percentages of students at or above each achievement level as well as the percentage that fall below the basic category.
THE 1996 REDESIGN OF NAEP
The overall purpose of the 1996 redesign of NAEP was to enable assessment of more subjects more frequently, release reports more quickly, and provide information to the general public in a readily understood form. In the “Policy Statement for Redesigning the National Assessment of Educational Progress” (National Assessment Governing Board, 1996), NAGB articulated three objectives for the redesign:
Measure national and state progress toward the third National Education Goal1 and provide timely, fair, and accurate data about student
The third goal states: “All students will leave grades 4, 8, and 12 having demonstrated competency over challenging subject matter including English, mathematics, science, foreign languages, civics and government, economics, arts, history, and geography, and every school in America will ensure that all students learn to use their minds well, so they may be prepared for responsible citizenship, further learning, and productive employment in our Nation 's modern economy” (National Education Goals Panel, 1994:13).
achievement at the national level, among states, and in comparison with other nations.
Develop, through a national consensus, sound assessments to measure what students know and can do as well as what they should know and be able to do.
Help states and others link their assessments to the National Assessment and use National Assessment data to improve education performance.
The policy statement laid out methods for accomplishing these objectives including one that called for the use of innovations in measurement and reporting. Discussed was the use of domain-score reporting in which “a goodly number of test questions are developed that encompass the subject, and student results are reported as a percentage of the domain that students know and can do.” Domain-score reporting was cited as an alternative to reporting results on “an arbitrary and less meaningful scale like the 0 to 500 scale” (National Assessment Governing Board, 1996:13).
The concepts of domain-score reporting and market-basket reporting were explained and further developed in a report from NAGB's Design and Feasibility Team (Forsyth et al., 1996). In this document, the authors described a market basket as a collection of items that would be made public so that users would have a concrete reference for the meaning of the score levels. They noted that the method for reporting results on the collection of items could be one that is more comfortable to users who are “familiar with only traditional test scores,” such as a percent-correct metric (Forsyth et al, 1996: 6-26).
Forsyth and colleagues explored three options for the market basket. One involved creating a market basket the size of a typical test form (like scenario two in Figure 1), and a second called for a market basket larger than a typical test form (like scenario one in Figure 1). Their third option drew on Bock's (1993) idea of domain referenced reporting. With this option, a sufficient quantity of items would be developed so as to constitute an operational definition of skill in the targeted domain, perhaps as many as 500 to 5,000 items. All of the items would be publicly released. They
explain further that “having specified how to define a score based on a student responding to all of these items, it would be possible to calculate a predictive distribution for this domain score from a student's response to some subset of the items” (Forsyth et al., 1996:6-29).
Forsyth et al. (1996:6-26) also described the conditions under which market-basket items could be embedded into existing tests and stated that, under some plans, the market basket might allow for “embedding parallel ‘market baskets' of items within more complex assessment designs. . . . Results from market basket forms would support faster and simpler, though less efficient, reporting, while information from broader ranges of items and data could be mapped into its scale using more complex statistical methods. . . . [R]eleased market basket forms could be made available to embed in other projects with strengths and designs that complement NAEP's.” This use of the market basket falls under the second scenario in Figure 1 where the market basket is the size of a typical test form.
In 1997, NAGB adopted a resolution supporting market-basket reporting, which was defined as making NAEP “more understandable to a wide public by presenting results in terms of percent correct on a representative group of questions called a market basket.” Additionally, the resolution stated that the market basket “may be useful in linking NAEP to state assessments” (National Assessment Governing Board, 1997:1).
NAEP DESIGN 2000-2010
Since the 1996 redesign, NAGB has continued to support extensive study of NAEP. Evaluation reports, reviews by experts, and commissioned papers highlight issues that bear on the 1996 redesign. Among these are when to change test frameworks, how to simplify NAEP's technical design, how to improve the process for setting achievement levels, and how NAEP results might be used to examine factors that underlie student achievement (National Assessment Governing Board, 1999b).
During extensive deliberations, NAGB recognized that NAEP was “being asked to do too many things, some even beyond its reach to do well, and was attempting to serve too many audiences” (National Assessment Governing Board, 1999b:2). Governing Board members found that NAEP's design was being overburdened in many ways. In its most recent redesign plan, “National Assessment of Education Progress: Design 2000-2010” (National Assessment Governing Board, 1999b), NAGB proposed to remedy these problems by refocusing the national assessment on what it
does best, i.e., measure and report on the status of student achievement and change over time. NAGB also drew distinctions among the various audiences for NAEP products. Their report pointed out that the primary audience for NAEP reports is the American public, whereas the primary users of its data have been national and state policymakers, educators, and researchers (National Assessment Governing Board, 1996:6).
The Design 2000-2010 policy stated five over-arching principles for the conduct and reporting of NAEP (National Assessment Governing Board, 1999b:3):
conduct assessments annually, following a dependable schedule
focus NAEP on what it does best
define the audience for NAEP reports
report results using performance standards
simplify NAEP's technical design
Details of the initiative to develop a short form appeared under the policy objective of simplifying NAEP's technical design (National Assessment Governing Board, 1999b:7):
Plans for a short-form of [NAEP], using a single test booklet, are being implemented. The purpose of the short-form test is to enable faster, more understandable initial reporting of results, and possibly for states to have access to test instruments allowing them to obtain NAEP assessment results in years in which NAEP assessments are not scheduled in particular subjects.
Like the 1996 redesign policy, the 2000-2010 design policy sought to use innovations in the measurement and reporting of student achievement, citing the short form as one means for accomplishing this objective. Further, the NAEP 2000-2010 design repeated the earlier objective of helping states and others link to NAEP and use NAEP data to improve education performance. (While this objective is not explicitly tied to the short form, suggestions for this use of the short form appeared in Forsyth et al., 1996.) The 2000-2010 policy goes a step beyond the 1996 policy in that it encourages states designing new state assessments to have access to NAEP frameworks, specifications, scoring guides, results, questions, achievement levels, and background data.
In addition, NCES has instituted a special program that provides grants for the analysis of NAEP data. NCES is now encouraging applications from states (and other researchers) to conduct analyses that will be of prac-
tical benefit in interpreting NAEP results and in improving education performance. The Design 2000-2010 Policy contains examples of studies in which NAGB has collaborated with states, such as Maryland and North Carolina, to examine the content of their state mathematics tests in light of the content of NAEP (National Assessment Governing Board, 1999b).