Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
A VALIDATION EXPERIMENT WITH TRIM2 298 MAJOR CONCLUSIONS The panel's experiment was successful in demonstrating that microsimulation model validation is feasible. Methods currently exist that enable analysts to measure the degree of variability attributable to the use of alternative components. Such information helps assess overall model uncertainty as well as which components to examine further to make improvements to the model. Thus, sensitivity analysis methods, especially when augmented with comparison values in an external validation, provide a great deal of data with which to direct efforts at model development as well as to measure model uncertainty. These methods have demonstrated that there is a great deal of uncertainty due to changes in these three modules in TRIM2. Therefore, the choice of which model version to use makes a great difference. On the other hand, it is not clear that any one of the 16 versions has any advantage over the others. (If there is a winner, it is the current version of TRIM2.) Certainly, for individual responses, certain versions fared better. However, given that the experiment is only one replication, we hesitate to conclude that this was confirmation of any real particular modeling advantage. Because the experiment did not attempt to measure the intramodel variance of any of the outputs of the various versions of TRIM2, we have no idea of the relative sizes of various sources of uncertainty in relation to variance. Therefore, it is difficult to assign a priority to the development of variance estimates vis-Ã -vis use of sensitivity analysis. We do believe that both are sufficiently important to warrant investigation. It should be stressed that the present experiment was purely illustrative. The benefits to be obtained by a continued process of validation are rarely evidenced through study of a single situation. There is an important question about the degree to which different studies of this sort of the same model in different modeling situations would represent replications in any sense. However, even if the studies are not replications, use of these methods will provide evidence of general trends in model performance. Their use will generate a great deal of information as to the situations under which a model performs well and can be trusted to provide accurate information. While a convincing case for the feasibility of sensitivity analyses and external validation has been made here, the experiment was not cheap. (See Table 8 for a description of the costs of the experiment.) The Urban Institute estimated that loaded staff costs to conduct the experiment were about $60,000 for 1,400 person-hours of effort or, roughly, 35 person-weeks. These estimates are certainly open to some question, because it was difficult for the Urban Institute to separate activities that were needed for the experiment from its own day-to- day work. In addition, the time taken to specify the experiment and analyze the data was not taken into consideration, and there are no estimates of computer costs. However, the overall impression from the cost and time
A VALIDATION EXPERIMENT WITH TRIM2 299 estimates is not open to question. The way in which TRIM2 (and most other microsimulation models) is currently configured can make a sensitivity analysis very costly. These costs were dramatically affected by the interest in trying out different forms of aging. The overall cost would have been substantially reduced, possibly by 20 percent, had an easier module been selected for the experiment. On the other hand, there were other factors that were not investigated because the costs of working with them would have been even higher. It is obvious that, for model validation to become a routine part of the model development and policy analysis process, the structure of the next generation of models must facilitate the type of module substitution that is necessary for sensitivity analyses. TABLE 8 Direct Time and Cost of Urban Institute Staff on Experiment Activity Time and Cost Percent of Total Hours Loaded Cost Percent of Total Cost Realigning the 1983 baseline simulations on file with 5 $2,900 4.8 interest and dividend adjustment Aging the 1983 CPS to 1987 18 11,800 19.6 Running old MONTHS module, comparing results to 9 5,300 8.8 MONTHS, etc. Simulations of 1987 law on 1983 files 13 7,400 12.3 Undercount adjustment 8 5,000 8.3 New 1983 and 1987 baseline runs on files with 3 1,600 2.7 undercount adjustment Work on the files with undercount adjustment: 7 4,200 7.0 rerunning aging, old MONTHS and MONTHS, and simulations of 1987 law on 1983 files Tables on the QC data 8 4,400 7.3 Tables on the CPS/TRIM2 data 18 10,700 17.8 Responding to information requests (e.g., memo on 6 3,500 5.8 aging procedures) Miscellaneous 6 3,300 5.5 Total 100.0 $60,100 100.0 (1,397 hours) NOTE: These figures exclude the time that Urban Institute staff spent aiding in the specification of the experiment and overall management of the project. SOURCE: Giannarelli (1990d).