The experience of the National Assessment Governing Board (NAGB) in setting achievement levels for the National Assessment of Educational Progress illustrates the challenges in making valid and reliable judgments about the levels of student performance. The NAGB achievement levels have received severe criticism over the years (National Research Council, 1998). Critics have found that the descriptions of performance NAGB uses to characterize “basic,” “proficient,” and “advanced” levels of achievement on NAEP do not correspond to student performance at each of the levels. Students who performed at the basic level could perform tasks intended to demonstrate proficient achievement, for example. Moreover, researchers have found that the overall levels appear to have been set too high, compared with student performance on other measures.
One issue surrounding the use of achievement levels relates to the precision of the estimates of the proportions of students performing at each level. Large margins of error could have important ramifications if the performance standards are used to reward or punish schools or school districts; a school with large numbers of students classified as “partially proficient” may in fact have a high proportion of students at the “proficient” level.
The risk of misclassification is particularly high when states and districts use more than one cutscore, or more than two levels of achievement, as NAEP does (Ragosa, 1994). However, other efforts have shown that it is possible to classify students' performance with a relatively high degree of accuracy and consistency (Young and Yoon, 1998). In any case, such classifications always contain some degree of statistical uncertainty; reports on performance should include data on the level of confidence with which the classification is made.
Another problem with standards-based reporting stems from the fact that tests generally contain relatively few items that measure performance against particular standards or groups of standards. While the test overall may be aligned with the standards, it may include only one or two items that measure performance on, say, the ability to identify the different types of triangles. Because student performance can vary widely from item to item, particularly with performance items, it would be inappropriate to report student results on each standard (Shavelson et al., 1993). As a result, reports that may be able to indicate whether students have attained standards can seldom indicate which standards students have attained. This limits their instructional utility, since the reports can seldom tell teachers which topic or skill a student needs to work on.
The challenges of reporting standards-based information are exacerbated with the use of multiple indicators. In some cases, the results for a student on two different measures could be quite different. For example, a student may perform well on a reading comprehension test but perform poorly on a writing assessment. This is understandable, since the two tests measure different skills; however, the apparent contradiction could appear confusing to the public (National Research Council, 1999b).
In an effort to help avoid such confusion and provide an overall measure of performance, many states have combined their multiple measures into a single