National Academies Press: OpenBook

A Valedictory: Reflections on 60 Years in Educational Testing (1995)

Chapter: Limitations on Testing

« Previous: Nonscientific Motives in Assessment Organizations
Suggested Citation:"Limitations on Testing." National Research Council. 1995. A Valedictory: Reflections on 60 Years in Educational Testing. Washington, DC: The National Academies Press. doi: 10.17226/9244.
×

I am not free to cite the particular misinterpretation, but it had to do with scoring and that is worth a paragraph here. As scorer error has been a technical concern for 80 years,4 you might think we would know how to appraise scorer accuracy. We don't. The literature is chaotic with alternative methods of summary that sometimes seem designed to not face the truth. One favorite simple scheme is to report the percentage of times scorers agree when they independently score the same student responses. To people who adopt that routine, a finding of 60 percent agreement is likely to be the basis for an announcement that “our scoring is accurate. ” I have seen one such instance, not atypical, in which 45 percent of students scored at 3 on the scale. Thus, if a check scorer were to assign a 3 to every paper, without reading the response, the percentage of agreement would be 45. From that baseline, 60 percent isn't very far along the road to perfection.

LIMITATIONS ON TESTING

Professional organizations are not prepared to protect the public and the schoolchildren against irresponsible promotion and the advertising of inaccurate results as accurate. No amount of adverse comment by professionals could get the 1980s Department of Education to abandon its totally misleading wall chart. You may recall that it ranked the states according to the mean scores of students who volunteered to take the Scholastic Aptitude Test (SAT). That was as unrepresentative a sample as you could get. But it was impossible to drive out the chart device under the administrations of those years. It has often taken either a press outbreak or a political fight to get major reviews. The CLAS review wouldn't have happened if there had been no Los Angeles Times attack. NRC wouldn't have done the thorough job it did on the General Aptitude Test Battery5 if there hadn't been congressional dispute. Over and over we see that it takes a political event or a public relations disaster before attention zeroes in on the quality of a testing program.

George Madaus and some others have been arguing that the professional test standards should be made enforceable. They were designed to be an educational device, and I

4  

Starch, D., and Elliott, E.C. (1913). Reliability of grading work in mathematics. School Review 21: 254-259.

5  

Hartigan, J.A., and Wigdor, A.K., eds. (1989). Fairness in Employment Testing: Validity Generalization, Minority General Aptitude Test Battery. Washington, D.C.: National Academy Press.

Suggested Citation:"Limitations on Testing." National Research Council. 1995. A Valedictory: Reflections on 60 Years in Educational Testing. Washington, DC: The National Academies Press. doi: 10.17226/9244.
×

would hate to see that change. The committee I chaired, which produced the first version of the standards in 1954, grew out of an American Psychological Association Committee on Ethical Standards. Among other ideas, that committee had suggested setting up a seal of approval for tests that were of high quality. Our group rejected that notion from the start, primarily because any test has many possible uses, and no official stamp of approval could fence off the sound uses of the test from unacceptable uses or uses not yet well researched.

Our committee aimed to set down the questions publishers should answer so that a trained test user could decide how adequate a test would be for the local purpose. We wanted not to discourage trial of new tests and new applications. We did urge test developers to limit claims, but we expected users to pioneer new applications. If we certify Test X for Use 8, there is a strong hint that practitioners shouldn't be trying it for Uses 7 and 9. Of course you should be trying out any reasonable application and checking on the quality of the result.

Standards committees have regarded tests as emerging in an orderly market in which a test would, over several years, find one or more niches. In no way can a code that was designed to promote professional use of documentation, released when the test is marketed at the end of a developmental research period, be applied to programs that are rushed into operation to meet politicians' deadlines. In the new assessments, documentation seems likely to lag two years behind application of a test to shape the fates of its targets.

The National Educational Standards and Improvement Council (NESIC) has been given the awesome task of certifying assessments as meeting professional standards. I doubt that standards can be written that would be definite enough to be enforceable yet general enough to apply over even a limited area of testing. Even in so mature an area as mathematics testing, in which the National Council of Teachers of Mathematics has laid excellent groundwork, one cannot apply stereotyped questions to the instruments educators have devised. Educators' judgments should not be circumscribed by the psychometric specialist's enthusiasm for intertask correlations or linear combinations of test scores. But this means that every assessment requires invention of new methods of psychometric checking. I am still somewhat numbed by the mismatch between the kinds of score our most sophisticated procedures have dealt with in the past and the structures of test forms and scoring rules I see in some current assessments. I do not say that the teachers who developed the forms were wrong. I say only that I do not expect a priori specification of analyses to define the work that would properly defend a novel assessment. I hope that the board can invent a device for this social need that is right for these times, and not try to resolve the problems within the already overextended test standards.

Suggested Citation:"Limitations on Testing." National Research Council. 1995. A Valedictory: Reflections on 60 Years in Educational Testing. Washington, DC: The National Academies Press. doi: 10.17226/9244.
×
Page 13
Suggested Citation:"Limitations on Testing." National Research Council. 1995. A Valedictory: Reflections on 60 Years in Educational Testing. Washington, DC: The National Academies Press. doi: 10.17226/9244.
×
Page 14
Next: Concrete Brain Storm »
A Valedictory: Reflections on 60 Years in Educational Testing Get This Book
×
 A Valedictory: Reflections on 60 Years in Educational Testing
MyNAP members save 10% online.
Login or Register to save!

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!