Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
VARIANCE ESTIMATION OF MICROSIMULATION MODELS THROUGH SAMPLE REUSE 245 size of two. A half-sample is defined as a sample in which one of each of the mirror-image pairs is removed from the sample. Thus, any half-sample can be identified by an inclusion vector of +1's and â1's of length H, where a +1 in the ith position indicates that the left image of the ith pair is included, and a â1 indicates that the right image is included. There are 2H half-samples. One possible method for computing the variance is to compute the statistic for each of the half-samples and then calculate the variance of these 2H statistics in the usual way. While clearly this can be done, it is computationally very expensive. McCarthy (1969) showed that for linear statistics one need only recalculate the statistic for a balanced set of half-samples. A balanced set of half-sample replicates is a set in which the inner product of any two inclusion vectors is 0. Roughly, a balanced set avoids the possibility of any mirror image being in the computations âtoo oftenâ or ânot often enough.â Efron (1982) also mentions the possibility of complementary balanced half-samples, which extends the utility of McCarthy's idea to quadratic statistics. Another technique that might be applicable in the sample survey context is presented by Efron (1982: Ch. 8). Again consider a sample in which each element has a natural mirror-image, denoted x11, x12, x21, x22,â¦, xh1, xh2, â¦, xH1, xH2. These elements could either be strata or strata halves. Let the value of the estimator computed for the data set, where xh2 is replaced with its mirror image, xh1. In other words, the weight for xh1 is doubled, and xh2 is removed from the computation. Let denote the same computation as except the roles of xh1 and xh2 are switched. Then Efron presents the following variance estimate: This variance estimate is relatively cheap computationally and also can be used on quadratic statistics. MODELS AND VARIABILITY Before continuing on to a discussion of how one might apply sample reuse techniques to the estimation of the variability of the output of a microsimulation model, it is important to briefly discuss philosophically what is meant by âmodelâ and âvariance.â In a complicated modeling effort, including microsimulation modeling, there is often no precise definition of the model. That is, there is no clear delineation between what parameters or modules define the model and are not variable, as opposed to what parameters or modules are variable and whose associated variances affect the results of the model. Another way of viewing this question is to ask to what extent the variance estimate is conditional on certain parameters or modules assumed to be fixed at
VARIANCE ESTIMATION OF MICROSIMULATION MODELS THROUGH SAMPLE REUSE 246 their current value. The data from the sample survey providing the core of the input data are certainly variable, and this affects the variability of the estimates derived from a microsimulation model. Likewise, control totals, which can derive from other sample surveys, can have variability that contributes to the variability of the results. But consider whether a microsimulation model contains an aging module or the type of aging module it contains. The decision of which type of aging module to include introduces a type of variability into the results, although this is considered by many analysts a proper subject for a sensitivity analysis, rather than a component of the variance. This argument suggests that the choice of which aging module to use is, in some generalized sense, a contribution to bias rather than to variance. Here, the model is often defined as having a particular component, and no variability derives from that choice since it is always made with this model. It is clear that the definition of model is crucial to a decision of what constitutes a contribution to variance. One can also view the variance estimate that derives from this approach as being conditional on the choice of aging routine. If one chooses to use the term model in a more general sense, assuming that the choice of the aging module was to some degree arbitrary, it would be important to include the variability that derived from that choice in any estimate of variance.2 However, such inclusion is difficult to do, since it involves both identifying all the possible relevant approaches to aging and eliciting subjective probabilities of the correctness of these aging modules.3 One can either perform a sensitivity analysis of the effect of the choice of aging module or more formally assess the variability of the model's output by including the contribution from the choice of the aging module. Another issue involves the use of demographic and macroeconomic projections in a microsimulation model. One approach is to assume that the projection used is fixed and to interpret the resulting variance estimate as conditional on the use of that particular projection. Another approach is to first derive a multivariate density for the relevant parameters of the projection models. For demographic projections, that would be a joint density for the fertility rates, mortality rates, and rates of immigration over the time period of interest, a difficult but not impossible task. More analysts appear to be comfortable with this approach than with attributing a variability to the choice of aging module; partly due to the relative ease, in this case, of defining the sample space of 2Along the same lines, an inherent source of variability is the modeling process itself; a choice of a linear or nonlinear model, or probit versus logit, is rarely clear; rather, it is a choice that may be based on intuition, tradition, ignorance, or careful thought. 3Rubin, in a series of papers on multiple imputation (see Little and Rubin, 1987), has provided a rigorous formulation for the variability due to alternative imputation models, which is closely related to the issue discussed here.