communication with a sponsor or a customer. To be useful, the findings from such formative studies must be timely; this may require that scientific rigor and precision be traded for speed. Accordingly, rapid prototyping and simulations are often used to provide representations of the elements to be examined in formative evaluations.

Once a system has been developed, a summative evaluation can be used to measure the capability of the system to fulfill its intended function, to compare its performance with that of alternative systems, and to assess its acceptance by intended users. For example, the training effectiveness of a virtual environment (VE) training system might be compared with that of a conventional training simulator. Quantitative measures of training performance and training transfer, together with pooled ratings of experts and judgmental information about the friendliness of the system, all would have a role in such a summative evaluation. Taken together, formative and summative evaluations provide a critique of a system over its entire life-cycle. Finally, evaluation studies can be used to estimate the cost-effectiveness of an SE system in performing a particular application function. The results of these studies can then be used to inform policy decisions about investment and production.

The specific type of evaluation to be conducted will depend, of course, on the characteristics of the system to be evaluated as well as the purpose of the evaluation. One dimension along which evaluations vary, mentioned in the preceding paragraph, concerns the extent to which the purpose of the evaluation is to guide system development or to determine various characteristics of a system that has already been developed (e.g., related to overall performance, cost-effectiveness). A second dimension concerns the amount and type of empirical work involved. The empirical component of the evaluation can be restricted to pure observation (of how the system performs, how the user behaves, how the market reacts); it can involve surveys in which system users and other individuals affected by the system are asked questions; or it can involve highly structured and controlled scientific experiments. Under certain conditions, evaluations can be conducted without any empirical work at all and be based solely on theoretical analyses, for example, when appropriate models are available for describing all relevant components and performance can be reliably estimated solely by human or machine computation. A third dimension concerns the extent to which the item being evaluated constitutes a whole system or just one of the components in the system of interest. As one would expect, in most cases, the evaluation of a system component is much simpler than the evaluation of the whole system (particularly when the whole system involves a human operator). A fourth dimension concerns the extent to which the evaluation is analytic, in the sense of providing information on how the performance of the whole system is related to

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement