The National Academies Press

Currently Skimming:

9 Validation
Pages 231-264

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 231... ... Yet agencies have typically underinvested in model validation, and microsimulation models have not been exempt from this pattern of underinvestment. Indeed, it is arguable that microsimulation models have received even less attention in terms of validation than have other model Eyeshot so much due to deliberate oversight as to the Ocularly daunting nature of the task. Read the entire page →
From page 232... ... In this chapter we first note the kinds of corroborative activities, such as assessing model outputs against the analyst's view of the world or against outputs from other similar models, that, to date, have constituted the bunk of the effort devoted to microsimulation model validation. Then, we briefly review three basic types of validation studies�external validation, sensitivity analysis, and variance estimation�and note some issues specific to their use in validating microsimulation models. Read the entire page →
From page 233... ... This study, our "validation experiment," started out as a sensitivity analysis of three components of the TRIM2 model: the procedures used for aging the database, the procedures used for generating monthly income and employment variables from the annual values on the March CPS, and the use of the standard TRIM2 March CPS database versus the use of a database corrected for undercoverage of the population. In addition, we turned the experiment into an external validation study, by having the TRIM2 model use its 1983 database to simulate ALEC program costs and caseloads according to the rules in effect for these programs in 1987. Read the entire page →
From page 234... ... ; and � comparing model output with output from other similar models or from the same model run by another agency. The last corroboration activity is an important mechanism for identifying and correcting problems with model estimates during the course of policy debates. Read the entire page →
From page 235... ... This is but one example of the need for rigorous validation, not just corroboration, of microsimulation models. TECHNIQUES OF MODEL VALIDATION External Validation Extemal validation of a model is a comparison of the estimates provided by the model against "the truth" that is, against values furnished by administrative records or other sources that are considered to represent a standard for comparison. Read the entire page →
From page 236... ... Another factor complicating external validation studies of microsimulation models concerns the appropriate time period for comparison, which may not always be clear. Because most models produce estimates of direct effects, the comparison period needs to be after the program changes are fully implemented but before any feedback effects could be expected to show up. Read the entire page →
From page 237... ... This component of TRIM2 was not specifically designed to accommodate a sensitivity analysis a situation not, of course, peculiar to TRIM2, or more generally to microsimulation models. In addition, the proprietary nature of some microsimulation models restricts the free exchange of model components. Read the entire page →
From page 238... ... However, a sensitivity analysis cannot by itself identify which components are working well, because there are no comparison values. On the other hand, external validation cannot usually identify specific weaknesses in individual model components because, when the entire model is tested against the comparison values, sufficient information on the components is difficult to generate. Read the entire page →
From page 239... ... . We are convinced at this stage that the bootstrap or another of the currently available sample reuse techniques can be used to estimate sampling variances for outputs of microsimulation models. Read the entire page →
From page 240... ... Roughly speaking, one could use a bootstrap to measure the variance and a sensitivity analysis to weakly measure bias. Thus, one could use sensitivity analysis alone or together with the bootstrap to evaluate the contribution of errors in the primary database or in other sources, such as undercoverage of the target population, inappropriate imputation methodologies to overcome nonresponse, and misreporting of key variables. Read the entire page →
From page 241... ... REVIEW OF VALIDATION STUDIES As part of our examination of the state of microsimulation model validation, we searched for and collected validation studies for two purposes: to obtain information on the performance of the models currently in use and to gather examples of the methods that others have used for microsimulation validation. Although the literature review that we commissioned was not fully comprehensive, we are reasonably certain that we have not missed any major validation studies of microsimulation models.5 We found only 13 validation studies of microsimulation models. Read the entire page →
From page 242... ... The second column of the table indicates the modelers) covered; the third and fourth columns indicate whether the study included a sensitivity analysis or external validation of the model; the fifth column indicates whether the study stated that me results of the analysis subsequently led directly to changes in the model; and the sixth column indicates the number of replications that were involved in each study. Read the entire page →
From page 243... ... 243 A: I _ 1 ~~ 1 �~ o �= Cal ._ Cal .O � ~ 3 �;> Cal ._ o V) Read the entire page →
From page 244... ... For comparison values in the first-phase study, Doyle and Trippe used the administrative data on the food stamp caseload, while recognizing that these data are also subject to error. To account for the sampling error present in the administrative estimates, they used confidence intervals about the administrative estimates for comparison, rather than only the estimates themselves. Read the entire page →
From page 245... ... Because of budget constraints, Haveman and Lacker could not carry out a sensitivity analysis; instead, they did a qualitative assessment of the likely source of the differences. They admit that this approach does not permit a clear answer as to which model's projections are Read the entire page →
From page 246... ... The kind of careful scrutiny of DYNASIM and PRISM that Haveman and Lacker gave in their paper, in conjunction with an external validation, would greatly expedite determination of the reasons for any discrepancies between the models and the tenth. In addition, work of this sort provides modelers with obvious examples of alternative methodologies that can be used in a sensitivity analysis. Read the entire page →
From page 247... ... Another application of regression used by Kormendi and Meguire was in the context of a sensitivity analysis. They attempted to attribute the changes from 1979 to 1985 to either administrative or economic-demographic changes. Read the entire page →
From page 248... ... to identify those modules that, when altered, have an appreciable effect on model outputs; (3) to obtain an admittedly limited measure of model validity against comparison values from administrative records (limited because only one time period was examined) Read the entire page →
From page 249... ... The second aging alternative, unemployment aging coupled with demographic aging, additionally invoked the routine to adjust labor force activity on the demographically aged file to meet targets from the March 1988 CPS for unemployment during the week of the survey and the pawing calendar year ("unemp"~. The third aging alternative, full aging, additionally invoked the routine to adjust income amounts for price changes and economic growth between 1983 and 1987 ("full"~. Read the entire page →
From page 250... ... Throughout the remainder of this section, we treat the 1983 and 1987 IQCS data as the "truth," that is, the comparison values. We have remarked elsewhere about the dangers in this assumption and have noted methods that might be used to deal with the problem of sampling error in the comparison values. Read the entire page →
From page 251... ... In general, the results in Table 9-3 show that there are important differences in most of the estimates when alternate modules are used. These observations are not meant to imply any weakness of TRIM2 relative to other models, microsimulation or otherwise, that have the goal of providing estimates of characteristics. Read the entire page →
From page 252... ... . First, in conjunction with the analysis of vanance, looking at the comparison values indicates which main effects are bringing the overall mean for the 16 model versions closer to or further away from the comparison value; this analysis provides an indication of which alternatives are more promising. Read the entire page →
From page 253... ... (This test can also be used for more than two sets of estimates, if desired.) This statistic, along with the associated degrees of freedom, is presented in Table 94 for all 16 model versions, for a variety of model outputs. Read the entire page →
From page 254... ... The last column of Table 94 displays the results from using the 1983 IQCS, which assumes that the characteristics of the caseload in 1987 remain unchanged from those in 1983. Under some circumstances, the comparison values for the beginning of a period provide an interesting challenge to the model versions. Read the entire page →
From page 255... ... Also, this comparison provides an estimate of how much variability is natural to the problem, which can be compared with the variability left unexplained by the model versions. Limitations of the Experiment Our experiment was designed both to illustrate the types of methods that can Read the entire page →
From page 256... ... It is important to point out that, although microsimulation modelers are aware of quality problems with the March CPS data, they do not regularly investigate changes in quality from year to year. Hence, we have an example of how validation can lead to more pointed validation and to identification of problems for further investigation and possible correction. Read the entire page →
From page 257... ... Of course, we could not have conducted an external validation of such an alternative policy because it was never enacted. However, we could have conducted sensitivity analyses of both policies and obtained information relevant to the question of whether, in fact, microsimulation model estimates of differences between two policies are less variable than estimates for a particular policy, because of sources of error affecting both policies to about the same extent. Read the entire page →
From page 258... ... Thus, sensitivity analysis methods, especially when augmented with comparison values in an external validation, provide a great deal of data with which to direct efforts at model development as well as to measure model uncertainty. Our experiment demonstrated that there is considerable uncertainty due to changes in the three modules we studied in TRIM2. Read the entire page →
From page 259... ... It is obvious that, for model validation to become a routine part of the model development and policy analysis process, the structure of the next generation of models must facilitate the lope of module substitution that is used in sensitivity analysis. In summary, our experiment gave mixed signals on the effectiveness of TRIM2. Read the entire page →
From page 260... ... The costs of validation are also evident from our experiment, in terms of both time and resources. Indeed, the kinds of external validation studies, sensitivity analyses, and variance estimation procedures that we outline for microsimulation models may well appear to involve a dismayingly high expenditure of staff and budget resources, particularly in light of the limited resources that have been allocated to these activities in the past. Read the entire page →
From page 261... ... In addition to the validation efforts performed by the modeling contractor, we believe it is essential for policy analysis agencies to commission independent validation studies that include external validation as well as sensitivity analysis. In principle, independent evaluation is preferable to evaluation performed by the developer and user of a model (just as academic journals appoint independent reviewers for articles submitted for publication) Read the entire page →
From page 262... ... . The validation contractor would be expected to carry out external validation studies of selected model estimates and also to conduct extensive sensitivity analyses in order to identify areas for needed model improvement or revision. Read the entire page →
From page 263... ... The goal of these longer range efforts should be to identify priority areas for model improvement. Research on Model Validation Methods 2�73 In addition to letting the kinds of validation contracts that we describe above, it would be useful for policy analysis agencies to support research specifically designed to develop improved methods for microsimulation model validation. Read the entire page →
From page 264... ... We recommend that policy analysis agencies support the development of quality profiles for the major microsimulation models that they use. The profiles should list and describe sources of uncertainty and identify priorities for validation work. Read the entire page →

From page 231...

... Yet agencies have typically underinvested in model validation, and microsimulation models have not been exempt from this pattern of underinvestment. Indeed, it is arguable that microsimulation models have received even less attention in terms of validation than have other model Eyeshot so much due to deliberate oversight as to the Ocularly daunting nature of the task.

9 Validation Pages 231-264

9 Validation
Pages 231-264