Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
A VALIDATION EXPERIMENT WITH TRIM2 300 LIMITATIONS OF THE PRESENT STUDY The present experiment was designed to be illustrative of the types of methods that can be used to identify model weaknesses and to provide some indication of the current performance of TRIM2. With respect to the second of these goals, the experiment is limited in a variety of ways. First, it studied TRIM2 during only one time period. With respect to estimating error properties, even in a replicable situation, one replication is very limiting. Moreover, forecasting situations should not be considered replicates without further investigation. Different time periods will typically present different challenges to a model. In particular, any characteristics of either the March 1984 or the March 1988 CPS data, or of the time period under study, that are peculiar to those data sets or to that time period reduce the opportunity for generalizing from our results. Indeed, an analysis conducted as part of the experiment showed that with respect to some outputs the simulations of 1987 law using the 1983 baseline file outperformed simulations of 1987 law using the 1987 baseline file. This finding triggered a more extensive analysis of the quality of the March 1988 and 1984 CPS data (see Giannarelli, 1990c). The investigation documented that March CPS files typically include some states that have insufficient simulated units eligible for AFDC compared with administrative counts of participantsâa phenomenon that complicates the calibration effort. It turned out that the March 1988 CPS had an unusually large number of such states (eight), which made calibration in that year less successful than usual. The number of simulated eligibles dropped by large percentages in some states from the previous year (e.g., by 29% in Connecticut, 22% in Michigan, and 32% in New Mexico). While sampling variability in the CPS appears to explain some of these changes, Michigan's remains unexplained. It is important to point out that, while microsimulation modelers are aware of quality problems with the CPS data, changes in quality from year to year are not regularly investigated. Hence, we have an example of how validation can lead to more pointed validation and identification of problems for further investigation and possible correction. The time period 1983 to 1987 was also special because of the large change in the employment rate and the differential impact of the change on different subpopulations. The changes in welfare program regulations between 1983 and 1987 were relatively minor, which limits our findings to periods of time when there are only a few minor changes in the law. The limitation imposed by having only one time replication is somewhat offset by having outputs for several characteristics, which focus on fairly distinct portions of the model, and by having outputs for states, which serve as minimodels (for which an extensive analysis was not done). However, there are circumstances that would cause a model to perform poorly for all states for a single time period that would not be indicative of the model's overall efficacy