National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

HARDBACK
price:$49.95
add to cart

Rights & Permissions

topleft topright

Knowing What Students Know: The Science and Design of Educational Assessment (2001)
Board on Testing and Assessment (BOTA)

Citation Manager

. "4 Contributions of Measurement and Statistical Modeling to Assessment." Knowing What Students Know: The Science and Design of Educational Assessment. Washington, DC: The National Academies Press, 2001.

Please select a format:

BibTeX EndNote RefMan


Page
122
bottomleft bottomright

The following HTML text is provided to enhance online readability. Many aspects of typography translate only awkwardly to HTML. Please use the page image as the authoritative form to ensure accuracy.


Knowing What Students Know: The Science and Design of Eduacational Assessment

FIGURE 4–4 Generalizability theory model with two facets— raters and item type.

treated as a random facet, it would be considered a random sample of all possible tasks generated under the same rules to measure the construct, and the results of the g-study would be considered to generalize to that universe of tasks. When facets are treated as fixed, the results are considered to generalize only to the elements of the facet in the study. Using the same example, if the set of tasks were treated as fixed, the results would generalize only to the tasks at hand (see Kreft and De Leeuw, 1998, for a discussion of this and other usages of the terms “random” and “fixed”).

In practice, researchers carry out a g-study to ascertain how different facets affect the reliability (generalizability) of scores. This information can then guide decisions about how to design sound situations for making observations—for example, whether to average across raters, add more tasks, or test on more than one occasion. To illustrate, in the BEAR assessment, a g-study could be carried out to see which type of assessment—embedded tasks or link items—contributed more to reliability. Such a study could also be used to examine whether teachers were as consistent as external raters. Generalizability models offer two powerful practical advantages. First, they allow one to characterize how the conditions under which the observations were made affect the reliability of the evidence. Second, this information is expressed in terms that allow one to project from the current assessment design to other potential designs.

Page
122