two ways: by pooling their resources, states could get more for the money they spend on assessment, and interstate collaboration is likely to facilitate deeper cognitive analysis of standards and objectives for student performance than is possible with separate standards.

Cost Savings

The question of how much states could save by collaborating on assessment begins with the question of how much they are currently spending. Savings would be likely to be limited to test development, since many perstudent costs for administration, scoring, and reporting, would not be affected. Wise discussed an informal survey he had done of development costs (Wise, 2009), which included 15 state testing programs and a few test developers and included only total contract costs, not internal staff costs: the results are shown in Table 6-1.

Perhaps most notable in the data is the wide range in what states are spending, as shown in the minimum and maximum columns. Wise also noted that on average the states surveyed were spending well over $1million annually to develop assessments that require human scoring and $26 per student to score them.

A total of $350 million will be awarded to states through the Race to the Top initiative. That money, plus savings that winning states or consortia could achieve by pooling their resources, together with potential savings from such

TABLE 6-1 Average State Development and Administration Costs by Assessment Type

Assessment Type

N

Mean

S.D.

Min

Max

Annual Development Costs (in thousands of dollars)

Alternate

9

363

215

100

686

Regular—ECR

13

1,329

968

127

3,600

Regular—MC Only

5

551

387

220

1130

Administrative Cost per Student (in dollars)

Alternate

9

376

304

40

851

Regular—ECR

16

26

18

4

65

Regular—MC Only

6

3

3

1

9

NOTES: ECR = extended constructed-response tests; Max = maximum cost; MC = multiple-choice tests; Min = minimum cost; N = number; S.D. = standard deviation. Extended constructed-response tests include writing assessments and other tests requiring human scoring using a multilevel scoring rubric. Multiple-choice tests are normally machine scored. Because the results incorporate a number of different contracts, they reflect varying grade levels and subjects, though most included grades 3-8 mathematics and reading.

SOURCE: Wise (2009, p. 4).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement