Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
A VALIDATION EXPERIMENT WITH TRIM2 295 ignores the structure implicit in the 16 model versions. One way to avoid this is to use robust analysis of variance on the discrepancies, standardized to have roughly the same units, where the analysis of variance would be a 2Ã2Ã4 design with 13 replications per cell for the 13 different response variables. Another problem with the above analysis is that large discrepancies cannot be distinguished from moderately large ones. This raises the general issue of the modeling goal and use of omnibus loss functions. A question that must be addressed (possibly repeatedly every few years) is whether it is desirable to have a model that predicts everything equally well or whether there are some responses that are more crucial than others. The answer, of course, determines which responses play a role in the analysis and the degree to which they are weighted. In addition, a metric on the errors must be chosen that compares errors of various magnitudes, so that it can be declared how muc h more disturbing 10 percent errors are, say, than 5 percent errors for each response. If a useful loss function can be identified, the implied weights could be used in a weighted analysis for most of the methods discussed here. Analysis of Categorical Data One of the major advantages of microsimulation models is that they provide information on the distributional impacts of changes in social welfare programs, generally unavailable from other forms of modeling. Up to now, we have analyzed only the categorical data in their dichotomized version, which was used to facilitate analysis of change. There are certainly situations where a single category, or a collection of related categories, is of primary interest, and in those cases dichotomizing so that the percentage of cases in that category (or categories) is analyzed is appropriate. However, at other times the full distribution is of interest. We therefore examined as a continuation of the external validation of TRIM2, for the undichotomized frequency table outputs and for estimates of level, how close the output frequencies from the various versions of TRIM2 corresponded to those from the 1987 quality control data. Table 7 presents the Ï2 test of independence for a 2Ãr contingency table, where r is the number of categories in the response, in which one row contains the frequencies from one model version and the second row contains those from the quality control data for 1987. Notationally, the test is as follows: where nij is the number of persons in category j estimated by run i; ni. is the sum over categoriesâthe total number of people âproducedâ by run i; n.j is the sum over runsâthe total number of people in a certain category; and N is the total number of people âproducedâ by the two runs (in which one run, in this case, involves the quality control comparison values). The quantity Q
A VALIDATION EXPERIMENT WITH TRIM2 296 has a chi-square distribution with râ1 degrees of freedom. Values of Q are provided in Table 7 for all 16 model versions and for the 1983 quality control data for a variety of model outputs. TABLE 7 Ï2 Goodness-of-Fit Statistics for Distributions, from TRIM2 Validation Experiment Variable Run Identification 1 2 3 4 5 6 7 8 Total no. in unit 26 37 28 28 27 36 28 27 No. of adults 10 27 39 45 11 27 41 45 No. of children 2 5 6 6 2 4 5 5 Age of youngest child 11 14 12 12 11 16 13 12 Gross income of unit 568 508 461 396 506 456 423 362 Earnings of adults 15 19 18 8 14 26 23 17 Type of AFDC unit 15 3 13 15 17 3 12 14 Race of head 4 9 2 2 6 10 2 2 Sex of head 28 19 1 0 31 21 1 0 Age of head 30 47 45 48 29 49 45 46 Relationship of unit head to household head 1.2 0.9 0.9 0.9 1.2 0.9 0.9 0.9 Marital status of head 1 1 15 20 1 2 16 19 Size of benefit 85 114 101 106 87 116 108 109 aD.F.indicates degrees of freedom. The Ï2 values at the 99 percent confidence limit are as follows (a higher value in the table indicates that a model version differs from the 1987 IQCS by an amount greater than one could expect by chance): D.F.=1, Ï2=6.635 D.F.=2, Ï2=9.210 D.F.=3, Ï2=11.341 D.F.=4, Ï2=13.277 D.F.=5, Ï2=15.086 D.F.=6, Ï2=16.812 D.F.=7, Ï2=18.475 D.F.=8, Ï2=20.090 In examining Table 7, as in the analysis of change, it is seen that no particular model version has any noticeable advantage over the other versions. Clearly, all model versions perform well in projecting the distribution of the relationship of unit head to household head, number of children, and race of head of unit. On the other hand, all model versions perform poorly for gross income of unit. There are some specific findings that remain to be confirmed. One is that adjustment appears to be useful to match the distribution of the comparison values for age of the youngest child. Full aging appears to be useful for sex of head, but not useful for marital status. This type of detailed analysis is clearly only suggestive and can be overdone since the study is
A VALIDATION EXPERIMENT WITH TRIM2 297 9 10 11 12 13 14 15 16 IQCS83 D.F.a 48 19 40 40 49 49 42 41 73 4 33 8 45 48 35 39 48 51 438 2 4 6 6 6 6 5 6 6 41 3 12 10 5 5 10 7 6 6 77 4 436 508 362 310 419 412 338 287 259 8 16 38 8 13 20 24 20 29 426 8 9 0 5 6 12 7 5 6 75 2 7 16 6 8 5 5 5 6 493 3 14 3 0 0 20 19 0 0 48 1 115 101 40 40 32 38 39 40 272 7 1.3 1.1 0.9 0.9 1.2 0.9 1.0 1.0 0 4 1 4 20 23 0 1 21 24 57 1 50 85 120 119 99 127 121 119 2015 7 limited. However, it is clear that the variability attributed to the choice of response variable dominates the variability due to model versions. The last column of Table 7 displays the results from using the 1983 quality control data, which as mentioned above is not a completely fair comparison given the size of the policy change examined. Nevertheless, the 1983 quality control data do not compete well with the 16 versions of TRIM2 in the analysis shown in Table 7. However, they do outperform many TRIM2 versions in estimating gross income of unit and relationship of unit head to household head. This test can also be used for more than two rows to investigate the similarity of several model versions as part of an analysis of the sensitivity of the distributions to model structure. For a categorical response with four categories, one could form the 16Ã4 contingency table and evaluate Q. However, this analysis would ignore the special structure that the 16 models have. There is no entirely satisfactory way, currently, to handle what is essentially an analysis of variance of frequency distributions. One can use dichotomization, as done above. Another way of partially circumventing the problem is to separately analyze particular subsets of the 16 model versions by using the test for independence and to look for similarities and differences in the separate analyses.