Purpose and Scope of the Study
POLICY CONTEXT
Why does it matter to anyone other than testing experts whether the results of state and commercial tests can be linked to a common scale? Although test linkage is a highly technical issue, the question posed by Congress reflects a broad, underlying goal held by many Americans to know more about how individual students in the United States are performing in relation to high national or international benchmarks of performance. Many people believe that students and teachers would also profit from knowing how a student's performance in key content areas compares with the performance of other students, other schools, other states, and other countries (see, e.g., Rose, Gallup, and Elam, 1997). Many people believe that if linkages among different tests enabled such comparisons they would help to spur improvements in schooling at the state and local levels (see, e.g., Achieve, 1998). Others hold different views of the utility of this type of information (see, e.g., Jones, 1997).
Existing assessments of student performance are diverse and are guided by different purposes (see, e.g., Bond, 1995). The committee recognizes in the legislation that requested this study a desire to bring about greater comparability among tests in the United States, while upholding traditions of state and local control of education and respecting the substantial public and private investments that have been made in developing educational tests and assessments. In a word, we interpret the charge to us as an expression of Congress's desire to forge some unity of interpretation within a heterogeneous system of testing and assessment. The focus of this report is whether an equivalency scale can be developed to accomplish that goal, the educational assessment equivalent of “e pluribus unum.”
PURPOSES OF LINKAGE
In formulating the question for this study, Congress was not explicit about the purposes of the proposed linkages. However, because the study originated in the vigorous debate about the President's proposal for voluntary national testing, the committee assumes that Congress sees linkage as a possible
substitute for the voluntary national tests. For example, if linkages could be developed that would permit the scores of individual students on existing tests and assessments to be compared with each other and reported in terms of the achievement levels used by NAEP, it might show which state tests are less challenging than others and whether individual children are reaching achievement levels defined by NAEP. Moreover, the information might be used by parents to work for improvements in their schools and school districts (Smith, Stevenson, and Li, 1998).
SCOPE OF THIS STUDY
The primary focus of the committee's examination is linkage among the tests and assessments currently used by states and districts to measure individual students' educational performance. The committee uses the terms “test” and “assessment” interchangeably, following Shepard (1994). If there is a difference between these two terms it is one of emphasis: a test usually refers to a particular coherent testing instrument; an assessment is more likely to refer to a system that involves more than one test.
This interpretation of our charge has led the committee to focus its deliberations on two principal issues:
-
the degree to which existing state and commercial tests and assessments can be linked to each other on a common scale, permitting individual scores from different tests and assessments to be compared; and
-
whether scores on existing state and commercial tests and assessments can be interpreted in terms of NAEP's achievement levels, so that parents can know how well their children are doing as measured against national benchmarks.
The committee has a large field of inquiry and a short time frame in which to analyze evidence and arrive at conclusions. The committee is examining a substantial amount of data about selected state and commercial tests and assessments that are likely candidates for the types of linkage suggested by the legislation. We are specifically investigating:
-
common uses of these tests;
-
diversity in their content and format;
-
their measurement properties, such as their difficulty and the reliability of their scores (by reliability we refer to their consistency over time);
-
the degree to which state and district tests change over time; and
-
the degree to which state policies affect uses and interpretations of test results.
To evaluate the methodological and technical issues involved in establishing linkages between these kinds of tests, the committee is reviewing past efforts to link different tests and assessments to each other and to NAEP. We are giving particular attention to the purposes of these linkages as we explore research on alternative methodologies and the validity of inferences drawn from the linked results. Where possible, we illustrate our empirical findings with examples from math and reading tests, the focus of much of today's policy debate.