The committee asked the authors to consider the research findings in light of the criterion, often referred to as the “interaction hypothesis” that is commonly used for judging the validity of accommodations, that is, the assumption that effective test accommodations will improve test scores for the students who need the accommodation but not for the students who do not need the accommodation. As Shepard et al. (1998) explained it, if accommodations are working as intended, there should be an interaction between educational status (students with disabilities and students without disabilities) and accommodation conditions (accommodated and unaccommodated). The accommodation should improve the average score for the students for whom it was designed (students with disabilities or English language learners) but should have little or no effect on the average score for the others (students without disabilities or native English speakers). If an accommodation improves the performance of both groups, then offering it only to certain students (students with disabilities or English language learners) is unfair.
Figure 5-1 is a visual depiction of the 2 × 2 experimental design used to test for this interaction effect. An interaction effect would be said to exist if the mean score for examinees in group C were higher than the mean score for group A, and the mean scores for groups B and D were similar.
The use of this interaction hypothesis for judging the validity of scores from accommodated administrations has, however, been called into question. In particular, questions have been raised about whether the finding of score improvements for the students who ostensibly did not need accommodations (from cell B to cell D) should invalidate the accommodation (National Research Council, 2002a, pp. 74-75). For example, if both native English speakers and English language learners benefit from a plain-language accommodation, does that mean that the scores are not valid for English language learners who received this accommodation?
There are also questions about whether the finding of score improvements