Second, classroom teachers are not expected to build formal Bayes nets in their classrooms from scratch. This is so even though the intuitive, often subconscious, reasoning teachers carry out every day in their informal assessments and conversations with students share key principles with formal networks. Explicitly disentangling the complex evidentiary relationships that characterize the classroom simply is not necessary. Nevertheless, a greater understanding of how one would go about doing this should it be required would undoubtedly improve everyday reasoning about assessment by policy makers, the public at large, and teachers. One can predict with confidence that the most ambitious uses of Bayes nets in assessments would not require teachers to work with the nuts and bolts of statistical distributions, evidence models, and Lauritzen-Spiegelhalter updating. Aside from research uses, one way these technical elements come into play is by being built into instructional tools. The computer in a microwave oven is an analogy, and some existing intelligent tutoring systems are an example. Neither students learning to troubleshoot the F-15 hydraulics nor their trainers know or care that a Bayes net helps parse their actions and trigger suggestions (see the HYDRIVE example presented in Annex 4–1). The difficult work is embodied in the device. More open systems than these will allow teachers or instructional designers to build tasks around recurring relationships between students’ understandings and their problem solving in a domain, and to link these tasks to programs that handle the technical details of probability-based reasoning.

The most important lesson learned thus far, however, is the need for coordination across specialties in the design of complex assessments. An assessment that simultaneously pushes the frontiers of psychology, technology, statistics, and a substantive domain cannot succeed unless all of these areas are incorporated into a coherent design from the outset. If one tries to develop an ambitious student model, create a complex simulation environment, and write challenging task scenarios—all before working through the relationships among the elements of the assessment triangle needed to make sense of the data—one will surely fail. The familiar practice of writing test items and handing them off to psychometricians to model the results cannot be sustained in complex assessments.


In the preceding account, measurement models were discussed in order of increasing complexity with regard to how aspects of learning are mod-


This section draws heavily on the commissioned paper by Brian Junker. For the paper, go to <>. [March 2, 2001].

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement