National Academies Press: OpenBook

Designing a Market Basket for NAEP: Summary of a Workshop (2000)

Chapter: 6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores

« Previous: 5 Using Innovations in Measurement and Reporting: Releasing a Representative Set of Test Questions
Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×

6

Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores

A second aspect of the NAEP market basket is reporting results in a metric easily understood by the public. For some time, NAEP has summarized performance as scale scores ranging from 0 to 500. However, it is difficult to attach meaning to scores on this scale. What does a score of 250 mean? What are the skills of a student who scores a 250? In which areas are they competent? In which areas do they need improvement?

Achievement level reporting was introduced in 1990 to enhance the interpretation of performance on NAEP. NAEP's sponsors believe that public understanding could be further improved by releasing a large number of sample items, summarizing performance using percent correct scores, and tying percent correct scores to achievement level descriptions. Since nearly everyone who has passed through the American school system has at one time or another taken a test and received a percent-correct score, most people could be expected to understand scores like 90%, 70%, or 50%. Unlike the NAEP scaled scores, the percent correct metric might have immediate meaning to the public.

PERCENT CORRECT METRIC: NOT AS SIMPLE AS IT SEEMS

At first blush, percent correct scores seem to be a simple, straightforward, and intuitively appealing way to increase public understanding of NAEP results. However, they present complexities of their own. First, NAEP contains a mix of multiple-choice and constructed response items.

Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×

In preliminary stages of scoring, multiple-choice items are awarded one point if answered correctly and zero points if answered incorrectly. Answers to constructed response items are also awarded points, but for some constructed response questions, six is the top score, and for others, three is the top score. For a given constructed response item, higher points are awarded to answers that demonstrate more proficiency in the particular area. Furthermore, a specific score cannot be interpreted, even at this preliminary stage, as meaning the same level of proficiency on different items (e.g., a four on one item would not represent the same level of proficiency as a four on another item). This situation becomes more complex at subsequent stages of IRT-based scoring and reporting, and the concept of “percent correct” becomes meaningless. Therefore, in order to come up with a simple sum of the number of correct responses to test items that include constructed response items, one would need to understand the judgment behind “correct answers.” What would it mean to get a “correct answer” on a constructed response item? What would be considered a correct answer? Receiving all points? Half of the points? Any score above zero?

As an alternative, the percent correct score might be based, not on the number of questions, but on the total number of points. This presents another complexity, however. Simply adding up the number of points would result in awarding more weight to the constructed response questions than to the multiple-choice questions. For example, suppose a constructed response question can receive between one and six points, with a two representing slightly more competence in the area than a one but clearly not enough competence to get a six. Compare a score of two out of six possible points on this item versus a multiple-choice item where the top score for a correct answer is one. A simple adding up of total points would give twice as much weight to the barely correct constructed response item as to an entirely correct multiple-choice item. This might be reasonable if the constructed response questions required a level of skill much higher than the multiple-choice questions, such that a score of two on the former actually represented twice as much skill as a score of one on the latter. Since this is not the case for NAEP questions, some type of weighting scheme is needed. Yet, weighting schemes also introduce complexity to the percent correct metric.

A number of workshop participants addressed the deceptive simplicity of percent correct scores. Several pointed out that the public already has difficulty understanding terms that psychometricians use, such as national percentile rank or grade-level equivalents. As a result, assessment directors

Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×

spend a good deal of time trying to ensure that policymakers and the public make the proper inferences from test results. The danger of the percent correct score is that everyone might think they understand it due to their own life experience, when, in fact, they do not.

Still, it should be pointed out that the percent correct metric has much intuitive appeal. If used correctly it might be of great benefit in increasing understanding of NAEP. Moreover, all statistics are susceptible to misuse, percent correct as well as more complex statistics. As Ronald Costello, assistant superintendent public schools in Noblesville, Indiana, observed:

It doesn't matter what the statistic is, it still will be used for rank ordering when it gets out to the public. There are 269 school districts in Indiana. When test results come out, there's a 1 and a 269. The issue is why are we testing students and what do we want to do with the results.

Costello concluded by saying that more important than the statistic is the use of the results. Attention should be focused on making progress in educational achievement, and the statistic should enable evaluation of the extent to which students have progressed.

DISCONNECT WITH PUBLIC PERCEPTIONS OF “PROFICIENT”

One plan for the NAEP percent correct scores is to report them in association with the NAEP achievement levels. At the workshop, Roy Truby presented a document that showed how this might be accomplished based on results from the 1992 NAEP mathematics assessment (Johnson et al., 1997). An excerpt appears in Table 1. This table displays percent correct results for test takers in grades four, eight, and twelve. Column 2 presents the overall average percent correct for test-takers in each grade. Columns 3-5 show the percent correct scores for each achievement level category associated with the minimum score cutpoint for the category. For example, the cutpoint for the fourth grade advanced category (Column 3) would be associated with a score of 80 percent correct. A percent correct score of 33 percent would represent performance at the cutpoint for twelfth grade's basic category.

Speakers cautioned that the percent correct scale used in Table 1 is unlike that understood by the public. In their opinion, people typically regard 70% as a passing score; scores around 80% as indicating proficiency; and scores of 90% and above as advanced. What would members of the

Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×

TABLE 1 Example of Market Basket Results

(1)

(2)

Cut Points by Achievement Level

Grade

Average Percent Correct Scorea

(3)

(4)

(5)

   

Advanced

Proficient

Basic

4

41%

80%

58%

34%

8

42

73

55

37

12

40

75

57

33

aIn terms of total possible points.

Note: The information in Table 1 is based on simulations from the full NAEP assessment; results for a market basket might differ depending on its composition.

general public think when they saw that the average American student scored less than 50% on the test represented in the table? Would this scheme be an appropriate basis for the public's evaluation of the level of education in schools today? According to one speaker:

Most test directors would understand why this might be, but no teacher, parent, or member of the public would consider 55% proficient. They would consider that score as representing “clueless” perhaps, and would think even less of the test and the educators that would purport to pass off 55% as proficient.

CONVERSION TO GRADES

While most Americans have at one time or another taken a test and received a percent score, generally that percent score was converted to a letter grade. Although associating percent correct scores with an achievement level might increase public understanding of NAEP, many people would still be tempted to convert the scores to letter grades, and their conversions might not be accurate. Richard Colvin offered his perspective as an education reporter for the Los Angeles Times:

On its own, a percent correct score is only slightly more meaningful than a scale score. The reason is that, in school, percent correct is translated into a grade: 93% or above for an “A,” 85% to 93% for a “B,” and so forth. If you were to put out a percent correct score for the market basket of items, I assure

Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×

you that journalists will push you to say what letter grade it represents. And, if you aren't willing to do that, journalists will do it for you.

Other participants echoed this concern, noting that the public would need a means for interpreting and evaluating percent correct scores.

ONE STEP FORWARD, TWO STEPS BACK

As described by Andrew Kolstad, senior technical advisor with NCES, in the first decade of NAEP, the percent correct metric was used for reporting results. Use of item response theory (IRT), beginning in the early 1980s, solved many of the interpretation problems that stemmed from the practice of reporting percent correct scores for subsets of items. Therefore, some workshop discussants wondered why NAEP would want to return to the metric used in its early years. David Thissen, professor of psychology at the University of North Carolina, emphasized this pointing out that “NAEP's use of the IRT scale in the past two decades has done a great deal to legitimize such IRT procedures with the result that many other assessments now use IRT scales. . . . [A] potential unintended consequence of NAEP reporting on a percent correct scale might be to drive many other tests, such as state assessments, to imitation.”

NAEP uses some of the most sophisticated and high-quality analytic and reporting methods available. If NAEP moves away from such procedures to a simpler percent correct metric, others will surely follow suit. Many discussants maintained that they did not see the benefits of the simpler metric.

DOMAIN REFERENCED REPORTING

During his comments on technical and measurement considerations, Don McLaughlin, chief scientist for the American Institutes of Research, reminded participants that the desired inferences about student achievement are about the content domain, not about the set of questions on a particular test form. The interest is not in the percent of items or points correct on a form. Instead, the interest is in the percent of the domain that children have mastered.

Domain referenced reporting was cited as an alternative to market-basket reporting. Domain referenced reporting is based on large collections of items that probe the domain with more breadth and depth than is

Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×

possible through a single administrable test form. As described by Darrell Bock, domain referenced reporting involves expressing scale scores in terms of the expected percent correct on a larger collection of items representative of the specified domain. The expected percents correct can be calculated for any given scale score using IRT methods and the estimated item parameters of the sample of test questions (see Bock et al., 1997). Bock further explained the concept of domain referenced reporting saying:

[A] domain sample for mathematics might consist of 240 items by selecting 4 items to represent each of the 60 cells of the domain specification described by [John] Mazzeo. These items could be drawn from previously released items from the NAEP assessment or from state testing programs. Their parameters could be estimated by adding a small number of additional examinees in each school participating in the [NAEP] and administering them special test forms containing small subsets of the domain sample, similar to those proposed for the market basket.

The point is to publish the 240 items in a compendium organized by the content, process, and achievement level categories. . . . For graded openended items, the rating categories should also be described and the “satisfactory” and “unsatisfactory” categories identified. The objective of this approach is not only to provide sufficient items from which readers of the assessment report can infer the knowledge and skills involved in mathematics achievement, but also, by publishing the compendium well before the assessment takes place, to encourage its use as a aid to instruction and self-study and as a basis for comment and explication in the media. When the results finally appear, there will then exist a ready and well-informed audience for the assessment report.

Bock went on to offer as an example of such a compendium the procedures used by the Federal Aviation Administration (FAA) to license private pilots. All 915 items that could potentially appear on the exam are published. And all potential pilots receive this compendium so that they may study the necessary material.

Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×
Page 34
Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×
Page 35
Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×
Page 36
Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×
Page 37
Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×
Page 38
Suggested Citation:"6 Using Innovations in Measurement and Reporting: Reporting Percent Correct Scores." National Research Council. 2000. Designing a Market Basket for NAEP: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9891.
×
Page 39
Next: 7 Simplifying NAEP's Technical Design: The Role of the Short Form »
Designing a Market Basket for NAEP: Summary of a Workshop Get This Book
×
Buy Paperback | $29.00 Buy Ebook | $23.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

At the request of the U.S. Department of Education, the National Research Council (NRC) established the Committee on NAEP Reporting Practices to examine the feasibility and potential impact of district-level and market-basket reporting practices. As part of its charge, the committee sponsored a workshop in February 2000 to gather information on issues related to market-basket reporting for the National Assessment of Education Progress (NAEP).

Designing a Market Basket for NAEP: Summary of a Workshop explores with various stakeholders their interest in and perceptions regarding the desirability, feasibility, and potential impact of market-basket reporting for the NAEP. The market-basket concept is based on the idea that a relatively limited set of items can represent some larger construct. The general idea of a NAEP market basket is based on an image of a collection of test questions representative of some larger content domain and an easily understood index to summarize performance on the items.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!