Methodological Issues and Approaches
The purpose of this chapter is to provide general methodological guidelines for the development, instantiation, and validation of models of human behavior. We begin with a section describing the need for the tailoring of models that incorporate these representations in accordance with specific user needs. The core of the chapter is a proposed methodological framework for the development of human behavior representations.
THE NEED FOR SITUATION-SPECIFIC MODELING
At present, we are a long way from having either a general-purpose cognitive model or a general-purpose organizational unit model that can be incorporated directly into any simulation and prove useful. However, the field has developed to the point that simulations incorporating known models and results of cognition, coordination, and behavior will greatly improve present efforts by the military, if—and only if—the models are developed and precisely tailored to the demands of a given task and situation, for example, the tasks of a tank driver or a fixed-wing pilot. It is also important to note that clear measures of performance of military tasks are needed. Currently, many measures are poorly defined or lacking altogether.
Given the present state of the field at the individual level, it is probably most useful to view a human operator as the controller of a large number of programmable components, such as sensory, perceptual, motor, memory, and decision processes. The key idea is that these components are highly adaptable and may be tuned to interact properly in order to handle the demands of each specific task in a particular environment and situation. Thus, the system may be seen as a
framework or architecture within which numerous choices and adaptations must be made when a given application is required. A number of such architectures have been developed and provide examples of how one might proceed, although the field is still in its infancy, and it is too early to recommend a commitment to any one architectural framework (see Chapter 3).
Given the present state of the field at the unit level, it is probably most useful to view a human as a node in a set of overlaid networks that connect humans to each other in various ways, connect humans to tasks and resources, and so forth. One key idea is that these networks (1) contain information; (2) are adaptable; and (3) can be changed by orders, technology, or actions taken by individuals. Which linkages in the network are operable and which nodes (humans, technology, tasks) are involved will need to be specified in accordance with the specific military application. Some unit-level models can be thought of as architectures in which the user, at least in principle, can describe an application by specifying the nodes and linkages. Examples include the virtual design team (Levitt et al., 1994) and ORGAHEAD (Carley and Svoboda, 1996; Carley, forthcoming).
The panel cannot overemphasize how critical it is to develop situation-specific models within whatever general architecture is adopted. The situations and tasks faced by humans in military domains are highly complex and very specific. Any effective model of human cognition and behavior must be tailored to the demands of the particular case. In effect, the tailoring of the model substitutes for the history of training and knowledge by the individual (or unit), a history that incorporates both personal training and military doctrine.
At the unit level, several computational frameworks for representing teams or groups are emerging. These frameworks at worst supply a few primitives for constructing or breaking apart groups and aggregating behavior and at best facilitate the representation of formal structure, such as the hierarchy, the resource allocation structure, the communication structure, and unit-level procedures inherited by all team members. These frameworks provide only a general language for constructing models of how human groups perform tasks and what coordination and communication are necessary for pursuing those tasks. Representing actual units requires filling in these frameworks with details for a specific team, group, or unit and for a particular task.
A METHODOLOGY FOR DEVELOPING HUMAN BEHAVIOR REPRESENTATIONS
The panel suggests that the Defense Modeling and Simulation Office (DMSO) encourage developers to employ a systematic methodology in developing human behavior representations. This methodology should include the following steps:
Developers should employ interdisciplinary teams.
They should review alternatives and adopt a general architecture that is most likely to be useful for the dominant demands of the specific situation of interest.
They should review available unit-level frameworks and support the development of a comprehensive framework for representing the command, control, and communications (C3) structure. (The cognitive framework adopted should dictate the way C3 procedures are represented.)
They should review available documentation and seek to understand the domain and its doctrine, procedures, and constraints in depth. They should prepare formal task analyses that describe the activities and tasks, as well as the information requirements and human skill requirements, that must be represented in the model. They should prepare unit-level task analyses that describe resource allocation, communication protocols, skills, and so forth for each subunit.
They should use behavioral research results from the literature, procedural model analysis, ad hoc experimentation, social network analysis, unit-level task analysis, field research, and, as a last resort, expert judgment to prepare estimates of the parameters and variables to be included in the model that are unconstrained by the domain or procedural requirements.
They should systematically test, verify, and validate the behavior and performance of the model at each stage of development. We also encourage government military representatives to work with researchers to define the incremental increase in model performance as a function of the effort required to produce that performance.
The sections that follow elaborate on the four most important of these methodological recommendations.
Employ Interdisciplinary Teams
For models of the individual combatant, development teams should include cognitive psychologists and computer scientists who are knowledgeable in the contemporary literature and modeling techniques. They should also include specialists in the military doctrine and procedures of the domain to be modeled. For team-, battalion-, and force-level models, as well as for models of command and control, teams composed of sociologists, organizational scientists, social psychologists, computer scientists, and military scientists are needed to ensure that the resultant models will make effective use of the relevant knowledge and many (partial) solutions that have emerged in cognitive psychology, artificial intelligence, and human factors for analyzing and representing individual human behavior in a computational format. Similarly, employing sociology, organizational science, and distributed artificial intelligence will ensure that the relevant knowledge and solutions for analyzing and representing unit-level behavior will be employed.
Understand the Domain in Depth, and Document the Required Activities and Tasks
The first and most critical information required to construct a model of human behavior for military simulations is information about the task to be performed by the simulated and real humans as regards the procedures, strategies, decision rules, and command and control structure involved. For example, under what conditions does a combat air patrol pilot engage an approaching enemy? What tactics are followed? How is a tank platoon deployed into defensive positions? As in the Soar-intelligent forces (IFOR) work (see Chapter 2), military experts have to supply information about the desired skilled behavior the model is to produce. The form in which this information is collected should be guided by the computational structure that will encode the tasks.
The first source of such information is military doctrine—the ''fundamental principles by which military forces guide their actions in support of national objectives" (U.S. Department of the Army, 1993b). Behavioral representations need to take account of doctrine (U.S. doctrine for own forces, non-U.S. doctrine for opposing forces). On the one hand, doctrinal consistency is important. On the other hand, real forces deviate from doctrine, whether because of a lack of training or knowledge of the doctrine or for good reason, say, to confound an enemy's expectations. Moreover, since doctrine is defined at a relatively high level, there is much room for behavior to vary even while remaining consistent with doctrine. The degree of doctrinal conformity that is appropriate and the way it is captured in a given model will depend on the goals of the simulation.
Conformity to doctrine is a good place to start in developing a human behavior representation because doctrine is written down and agreed upon by organizational management. However, reliance on doctrine is not enough. First, it does not provide the task-level detail required to create a human behavior representation. Second, just as there are both official organization charts and informal units, there are both doctrine and the ways jobs really get done. There is no substitute for detailed observation and task analysis of real forces conducting real exercises.
The Army has a large-scale project to develop computer-generated representations of tactical combat behavior, such as moving, shooting, and communicating. These representations are called combat instruction sets. According to the developers (IBM/Army Integrated Development Team, 1993), each combat instruction set should be:
Described in terms of a detailed syntax and structure layout.
Explicit in its reflection of U.S. and opposing force tactical doctrines.
Explicit in the way the combat instruction set will interface with the semiautomated forces simulation software.
Traceable back to doctrine.
Information used to develop the Army combat instruction sets comes from written doctrine and from subject matter experts at the various U.S. Army Training and Doctrine Command schools who develop the performance conditions and standards for mission training plans. The effort includes battalion, company, platoon, squad, and platform/system-level behavior. At the higher levels, the mission, enemy, troops, terrain, and time available (METT-T) evaluation process is used to guide the decision making process. The combat instruction sets, like the doctrine itself, should provide another useful input to the task definition process.
At the individual level, although the required information is not in the domain of psychology or of artificial intelligence, the process for obtaining and representing the information is. This process, called task analysis and knowledge engineering, is difficult and labor-intensive, but it is well developed and can be performed routinely by well-trained personnel.
Similarly, at the unit level, although the required information is not in the domain of sociology or organizational science, the process for obtaining and representing the information is. This process includes unit-level task analysis, social network analysis, process analysis, and content analysis. The procedures involved are difficult and labor-intensive, often requiring field research or survey efforts, but they can be performed routinely by well-trained researchers.
At the individual level, task analysis has traditionally been applied to identify and elaborate the tasks that must be performed by users when they interact with systems. Kirwan and Ainsworth (1992:1) define task analysis as:
… a methodology which is supported by a number of specific techniques to help the analyst collect information, organize it, and then use it to make judgments or design decisions. The application of task analysis methods provides the user with a blueprint of human involvement in a system, building a detailed picture of that system from the human perspective. Such structured information can then be used to ensure that there is compatibility between system goals and human capabilities and organization so that the system goals will be achieved.
This definition of task analysis is conditioned by the purpose of designing systems. In this case, the human factors specialist is addressing the question of how best to design the system to support the tasks of the human operator. Both Kirwan and Ainsworth (1992) and Beevis et al. (1994) describe in detail a host of methods for performing task analysis as part of the system design process that can be equally well applied to the development of human behavior representations for military simulations.
If the human's cognitive behavior is being described, cognitive task analysis approaches that rely heavily on sophisticated methods of knowledge acquisition are employed. Many of these approaches are discussed by Essens et al. (1995). Specifically, Essens et al. report on 32 elicitation techniques, most of which rely either on interviewing experts and asking them to make judgments and categorize material, or on reviewing and analyzing documents.
Descriptions of the physical and cognitive tasks to be performed by humans in a simulation are important for guiding the realism of behavior representations. However, developing these descriptions is time-consuming and for the most part must be done manually by highly trained individuals. Although some parts of the task analysis process can be accomplished with computer programs, it appears unlikely that the knowledge acquisition stage will be automated in the near future. Consequently, sponsors will have to establish timing and funding priorities for analyzing the various aspects of human behavior that could add value to military engagement simulations.
At the unit or organizational level, task analysis involves specifying the task and the command and control structure in terms of assets, resources, knowledge, access, timing, and so forth. The basic idea is that the task and the command and control structure affect unit-level performance (see Chapter 10). Task analysis at the unit level does not involve looking at the motor actions an individual must perform or the cognitive processing in which an individual must engage. Rather, it involves laying out the set of tasks the unit as a whole must perform to achieve some goal, the order in which those tasks must be accomplished, what resources are needed, and which individuals or subunits have those resources.
A great deal of research in sociology, organizational theory, and management science has been and is being done on how to do task analysis at the unit level. For tasks, the focus has been on developing and extending project analysis techniques, such as program evaluation and review technique (PERT) charts and dependency graphs. For the command and control structure, early work focused on general features such as centralization, hierarchy, and span of control. Recently, however, network techniques have been used to measure and distinguish the formal reporting structure from the communication structure. These various approaches have led to a series of survey instruments and analysis tools. There are a variety of unresolved issues, including how to measure differences in the structures and how to represent change.
Instantiate the Model
A model of human behavior must be made complete and accurate with specific data. Ideally, the model with its parameters specified will already be incorporated into an architectural framework, along with the more general properties of human information processing mechanisms. Parameters for selected sensory and motor processes can and should be obtained from the literature. However, many human behavior representations are likely to include high-level decision making, planning, and information-seeking components. For these components, work is still being done to define suitable underlying structures, and general models at this level will require further research. In many cases, however, the cognitive activities of interest should conform to doctrine or are highly
proceduralized. In these cases, detailed task analyses provide data that will permit at least a first-order approximation of the behavior of interest.
Sometimes small-scale analytical studies or field observations can provide detailed data suitable for filling in certain aspects of a model, such as the time to carry out a sequence of actions that includes positioning, aiming, and firing a rifle or targeting and launching a missile. Some of these aspects could readily be measured, whereas others could be approximated without the need for new data collection by using approaches based on prediction methods employed for time and motion studies in the domain of industrial engineering (Antis et al., 1973; Konz, 1995), Fitts' law (Fitts and Posner, 1967), or GOMS1 (John and Kieras, 1996; Card et al., 1983). These results could then be combined with estimates of perceptual and decision making times to yield reasonable estimates of human reaction times for incorporation into military simulations.
Inevitably, there will be some data and parameter requirements for which neither the literature nor modeling and analysis will be sufficient and for which it would be too expensive to conduct even an ad hoc study. In those cases, the developer should rely on expert judgment. However, in conducting this study, the panel found that expert judgment is often viewed as the primary source of the necessary data; we emphasize that it should be the alternative of last resort because of the biases and lack of clarity or precision associated with such judgments.
Much of the modeling of human cognition that will be necessary for use in human behavior representations—particularly those aspects of cognition involving higher-level planning, information seeking, and decision making—has not yet been done and will require new research and development. At the same time, these new efforts can build productively on many recent developments in the psychological and sociological sciences, some of which are discussed in the next chapter.
Verify, Validate, and Accredit the Model
Before a model can be used with confidence, it must be verified, validated, and accredited. Verification refers here to the process of checking for errors in the programming, validation to determining how well the model represents reality, and accreditation to official certification that a model or simulation is acceptable for specific purposes. According to Bennett (1995), because models and simulations are based on only partial representations of the real world and are modified as data describing real events become available, it is necessary to conduct verification and validation on an ongoing basis. As a result, it is not possible to ensure
Verification may be accomplished by several methods. One is to develop tracings of intermediate results of the program and check them for errors using either hand calculations or manual examination of the computations and results. Verification may also be accomplished through modular programming, structured walkthroughs, and correctness proofs (Kleijnen and van Groenendaal, 1992).
Validation is a more complex matter. Indeed, depending on the characteristics of the model, its size, and its intended use, adequate demonstration of validity may not be possible. According to DMSO, validation is defined as "the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended users of the model" (U.S. Department of Defense, 1996). The degree of precision needed for a model is guided by the types and levels of variables it represents and its intended use. For example, some large models have too many parameters for the entire model to be tested; in these cases, an intelligent testing strategy is needed. Sensitivity analysis may be used to provide guidance on how much validity is needed, as well as to examine the contributions of particular models and their associated costs. Carley (1996b) describes several types of models, including emulation and intellective models. Emulation models are built to provide specific advice, so they need to include valid representations of everything that is critical to the situation at hand. Such models are characterized by a large number of parameters, several modules, and detailed user interfaces. Intellective models are built to show proof of concept or to illustrate the impact of a basic explanatory mechanism. Simpler and smaller than emulation models, they lack detail and should not be used to make specific predictions.
Validation can be accomplished by several methods, including grounding, calibration, and statistical comparisons. Grounding involves establishing the face validity or reasonableness of the model by showing that simplifications do not detract from credibility. Grounding can be enhanced by demonstrating that other researchers have made similar assumptions in their models or by applying some form of ethnographic analysis. Grounding is appropriate for all models, and it is often the only level of validation needed for intellective models.
Calibration and statistical comparisons both involve the requirement for real-world data. Real-life input data (based on historical records) are fed into the simulation model, the model is run, and the results are compared with the real-world output. Calibration is used to tune a model to fit detailed real data. This is often an interactive process in which the model is altered so that its predictions come to fit the real data. Calibration of a model occurs at two levels: at one level, the model's predictions are compared with real data; at another, the processes and parameters within the model are compared with data about the processes and
parameters that produce the behavior of concern. All of these procedures are relevant to the validation of emulation models.
Statistical or graphical comparisons between a model's results and those in the real world may be used to examine the model's predictive power. A key requirement for this analysis is the availability of real data obtained under comparable conditions. If a model is to be used to make absolute predictions, it is important that not only the means of the model and the means of the real world data be identical, but also that the means be correlated. However, if the model is to be used to make relative predictions, the requirements are less stringent: the means of the model and the real world do not have to be equal, but they should be positively correlated (Kleijnen and van Groenendaal, 1992).
Since a model's validity is determined by its assumptions, it is important to provide these assumptions in the model's documentation. Unfortunately, in many cases assumptions are not made explicit. According to Fossett et al. (1991), a model's documentation should provide an analyst not involved in the model's development with sufficient information to assess, with some level of confidence, whether the model is appropriate for the intended use specified by its developers.
It is important to point out that validation is a labor-intensive process that often requires a team of researchers and several years to accomplish. It is recommended that model developers be aided in this work by trained investigators not involved in developing the models. In the military context, the most highly validated models are physiological models and a few specific weapons models. Few individual combatant or unit-level models in the military context have been validated using statistical comparisons for prediction; in fact, many have only been grounded. Validation, clearly a critical issue, is necessary if simulations are to be used as the basis for training or policy making.
Large models cannot be validated by simply examining exhaustively the predictions of the model under all parameter settings and contrasting that behavior with experimental data. Basic research is therefore needed on how to design intelligent artificial agents for validating such models. Many of the more complex models can be validated only by examining the trends they predict. Additional research is needed on statistical techniques for locating patterns and examining trends. There is also a need for standardized validation techniques that go beyond those currently used. The development of such techniques may in part involve developing sample databases against which to validate models at each level. Sensitivity analysis may be used to distinguish between parameters of a model that influence results and those that are indirectly or loosely coupled to outcomes. Finally, it may be useful to set up a review board for ensuring that standardized validation procedures are applied to new models and that new versions of old models are docked against old versions (to ensure that the new versions still generate the same correct behavior as the old ones).