Read "Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers" at NAP.edu

« Previous: COMPUTATIONS OF MICROSIMULATION MODELS

Page 239 Cite

Suggested Citation:"NONPARAMETRIC VARIANCE ESTIMATION." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.

Page 240 Cite

Page 241 Cite

Page 242 Cite

Page 243 Cite

Page 244 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

VARIANCE ESTIMATION OF MICROSIMULATION MODELS THROUGH SAMPLE REUSE 239 economic, and other shifts in the population, referred to as static aging, or take each individual record probabilistically through year-to-year life changes, referred to as dynamic aging. Both methods are attempts to create a sample population that represents the expected properties of populations in future years. Finally, after the aging procedure is completed, the entire file is often reweighted to bring certain totals of the data file close to control totals collected from a variety of sources, such as the number of eligible people who apply for funds from a program. This reweighting is one way of correcting for the misrepresentation of various groups in the original data set. The controlling to âacceptedâ marginal information is done both formally and somewhat informally, depending on how far the observed margins are from the âacceptedâ margins and depending on which marginal information is involved. NONPARAMETRIC VARIANCE ESTIMATION Until 25 years ago, variance estimation was limited in application to simple estimators, and, depending on the estimator, sometimes only for data obeying a distribution from a limited family of distributions. The theory giving rise to the estimation of the variance of a mean is simple and requires little knowledge of the underlying distribution. Similarly, the derivation leading to the estimation of the variance of a regression coefficient is relatively straightforward. However, the derivation of the estimation of the variance of a correlation coefficient is much more difficult; the answer is known for only a few bivariate distributions, including bivariate normality, for which the needed integration is rather involved. Thus, it is not surprising that estimating the variance of a model that involves a complicated series of computationsâincluding imputations and statistical matchings; many regression and regression-type models, including use of logit models to estimate participation; controlling margins to accepted control totals; use of complicated aging techniques; and several other possible computational featuresâis not possible using standard methods. Calculus can only get one so far. Recently, however, several methods have been developed that provide the opportunity to estimate what was previously not possible. These methods include balanced half-sample replication, the jackknife, the infinitesimal jackknife, and the bootstrap; they are also referred to as sample reuse methods. Nonparametric variance estimators trace their beginning to Quenouille (1949), who introduced the jackknife. Quenouille was interested in the bias reduction properties of the jackknife. Tukey (1958) was the first who suggested using the jackknife for variance estimation. To introduce some notation, let denote an estimator that is a function of n data points, x1,â¦, xn. Denote the estimator computed for the sample with the ith observation removed, that is, . These are sometimes referred to as leave-one-out estimates. One can calculate n different estimates, by

VARIANCE ESTIMATION OF MICROSIMULATION MODELS THROUGH SAMPLE REUSE 240 removing each of the n sample points, in turn, from the calculation. The mean of these n leave-one-out estimates is denoted . Tukey noticed that the jackknife estimate, could be expressed as the average of n terms, the ith term being equal to . This was called the ith pseudovalue. In the special case where the leave-one-out mean is also and the ith pseudovalue is simply xi, an original data element. Using the implied analogy, it was hoped that the pseudovalues in general would be nearly independent and identically distributed (as are the original data values xi), so that the variance of would be well estimated by which in the case of the sample mean is the usual variance estimate. The analogy has not turned out to be as complete as had been hoped, because the ratio of (minus its expectation) to the square root of the estimated variance does not usually have a t distribution with nâ1 degrees of freedom, even asymptotically. However, the variance estimator has been shown by Miller (1964, 1974a, 1974b) and others to be trustworthy for a variety of estimators and distributional families. The estimators that have been studied include means, variances, generalized means, and U-statistics. Essentially, the jackknife works for those estimators that are nearly linear and are not too discrete. For example, the jackknife has been shown to be less effective in estimating the variance of robust estimators, which are fairly nonlinear, and the jackknife has been shown to be inappropriate for estimating the variance of the median, which is too discrete an estimator, with the result that the leave-one-out estimates do not take on enough distinct values. However, the jackknife has been shown to be effective for unbalanced situations, such as regression, as well as for censored data. Finally, there is the grouped jackknife, suggested for computational ease, which at each step leaves some random or representative subset of the sample out of the computation, for example, using leave-k-out estimates for various values of k. Jaeckel (1972) helped provide a theoretical basis for the jackknife by linking it to influence functions. (For a reference to influence functions, see Hampel [1974]). An influence function (or curve) indicates how sensitive a statistic is to changes in the underlying distribution, denoted F, generating the data on which the estimator is used. It can be shown that the integral of the square of the influence function can be used to estimate the asymptotic variance of an estimator. The jackknife can be expressed as the s um of the square of an

VARIANCE ESTIMATION OF MICROSIMULATION MODELS THROUGH SAMPLE REUSE 241 empirical influence function, since it can be expressed as the sum of squares of terms involving which measure the change in the estimator due to a small change in the data setâthe exclusion of one data point (a change to the empirical distribution function). Therefore it is not surprising that the jackknife is effective at estimating variances. Jaeckel developed the following estimator. Consider as a function of the n data points x1,â¦, xn and of the n weights w1,â¦, wn, which are nonnegative (but do not necessarily sum to 1). If all of the wi are equal to 1/n, then is the usual estimate, which can be written where is the empirical distribution function. Jaeckel realized that omitting an observation is the same as giving it a weight of 0. This fact raises the possibility of investigating the effect on an estimator of other changes to the weights, including the gradual removal of an observation from the sample. Jaeckel defined equal to . He then expanded in a Taylor series in e. An estimate of the variance of is a simple function of the sum of squares of the limit as e approaches 0 of the first-order terms of each Taylor series for each i. These are also empirical influence functions. If e is set equal to 1/n, the above estimator is equal to the jackknife estimate of variance. The resulting variance estimate, called the infinitesimal jackknife, is rarely proposed as an alternative to the jackknife for variance estimation, probably because of the strong assumptions about smoothness needed by the above limits, especially for small samples. However, the idea of considering a statistic as a function of the weights assigned to the observations likely gave rise to the current variance estimate of choice, the bootstrap. The reasoning behind Efron's (1979) bootstrap is as follows. The variance is a function of the average squared distance between and the true value âthe estimator evaluated using the true distribution F (which can be considered the limit of where Fn represents a series of empirical distribution functions with limit F). By using the assumption that the xi are independent and identically distributed, the empirical distribution function, is a sufficient statistic for the data set, and so one can concentrate on the average squared distance between and . Unfortunately, F is unknown, so one cannot use simulation to estimate the above quantity. However, as the sample size grows, will grow closer to F. It is possible that a good estimate of the distance between and even for small samples, is the distance between and where represents the empirical distribution derived from sampling from . (Another way to say this is that the observed relative frequencies of different estimates obtained by evaluating the estimator on pseudosamples drawn from the observed population are used to approximate the unknown relative frequencies obtained by sampling from the

VARIANCE ESTIMATION OF MICROSIMULATION MODELS THROUGH SAMPLE REUSE 242 entire population and evaluating the estimator for those samples.) This idea has been shown to work for a wide variety of problems.1 Computationally, the bootstrap estimate is developed as follows. One collects a sample of size n, x1,â¦, xn. Assume first that the x's are independent and identically distributed. Then perform the following two steps K times, where K is, if possible, a large number: 1. sample with replacement from x1,â¦, xn, collecting a pseudosample of size n, denoted 2. for the kth pseudosample, compute . Then estimate the variance of with the observed variance of the that is, compute: where is the average of the over k. The bootstrap sample for any given replication will very likely not include some of the original data, will have other data included once, and may have other data included twice, three times, or more. Some analysts find this aspect of the bootstrap troubling. For this reason, it is probably best to consider the bootstrapping process as a method for computing the average squared difference between and or what is essentially the same thing, the variance of assuming to be the underlying distribution. Thus the bootrapping process is just a convenient way to compute, approximately, an expectation with respect to a discrete distribution. Such a process is undeniably computationally expensive. It is clear that these ideas would not have been feasible until about 10 to 15 years ago when computation became quicker and cheaper. Once the fundamental bootstrapping idea is understood, it seems like a valuable, straightforward technique that could be applied to a variety of problems. The bootstrap, as opposed to the jackknife, does work for the median and many other nonlinear estimators, and for many estimators the bootstrap is more efficient than the jackknife. However, there are a number of possible 1Another way to say this is that one wants an estimate of where F n is the distribution function for a sample of n x's. It can be approximated by where is the distribution function which generates x*'s obtained by sampling from replacement from x1,â¦, xn. This is accomplished by substituting for F in the equation above. Therefore, the bootstrap uses as an estimate of F in the definition of the variance of , which is asymptotically justifed since is a consistent estimator for F.

VARIANCE ESTIMATION OF MICROSIMULATION MODELS THROUGH SAMPLE REUSE 243 complications, two of which occur in regression problems. (See Efron [1986] for a more complete discussion of bootstrapping regression problems.) First, consider the estimation of the variance of a regression coefficient (for which one does not need the bootstrap since the answer is known). The naive approach to the use of the bootstrap is to treat the rows of data {yi, x1, x2i,â¦, xpi} as independent and identically distributed random vectors from a multivariate distribution, resample with replacement from this data set as indicated above, and calculate the regression coefficient for each pseudosample. Then the bootstrap variance estimate is simply the sample variance of these regression coefficients. However, this multivariate assumption is often not typical of regression problems. One often wants to consider the independent variables as fixed, not as representative of some larger population. In that case, there is no longer a âbalancedâ situation since the xi's for each case are not identically distributed; they are not even random variables. For example, one may have purposely chosen the independent variables to be spread out for purposes of better estimation. Therefore, one needs to identify independent and identically distributed random variables within the problem so that a derived empirical distribution function estimates some F that makes sense. The random component of the typical regression model is the error term. Thus, what is usually done is to regress y on the x's, determine the n residuals, and then consider the empirical distribution function of the residuals. One then repeatedly (1) creates pseudosamples of residuals by resampling with replacement from the original sample of n residuals, (2) reattaches them to the fitted values, and then (3) computes the regression coefficients. This is done K times. One can then estimate the variance of any individual regression coefficient by computing the variance of that coefficient over the bootstrap replications. A critic might point out that the residuals are hardly independent or identically distributed. It is true that the residuals are weakly dependent, but this dependence is probably safely ignored. More importantly, the residuals often have greatly different variances. This heterogeneity should be addressed if it is extreme by standardizing the residuals, using the so-called hat matrix before resampling and then reinstating the heterogeneity, depending on where each pseudoresidual is reattached. The second example is a further complication of this problem. Suppose that the errors are autocorrelated. Then the time-series model that is appropriate for the errors, as well as the regression model itself, needs to be estimated. The residuals from the time-series model representing (approximately) independent, identically distributed normal random variates are resampled and reattached to the time-series model to compute residuals from the regression. These residuals are then reattached to the regression model. The process is repeated several times and the variances are computed as above. The fundamental idea is that the resampling can only be performed on (approximately) independent and identically distributed units. (Actually, the most general statement is slightly weaker than this one, involving exchangeability.) The assumptions one makes

VARIANCE ESTIMATION OF MICROSIMULATION MODELS THROUGH SAMPLE REUSE 244 to arrive at independent and identically distributed units can result in a variety of âbootstraps.â Thus, this âautomaticâ procedure for variance estimation is not necessarily that automatic; a lot of modeling and decision making is involved. Efron (1979) showed that the infinitesimal jackknife variance estimate could be expressed as the bootstrap variance estimate for a linear version of the statistic of interest. Therefore, if a statistic is linear, the infinitesimal jackknife and the bootstrap estimates are essentially the same. And if a statistic is not linear, and one makes a close linear approximation to the statistic, the bootstrap variance estimate of the linearized statistic is essentially the same as the infinitesimal jackknife estimate of the variance of the original statistic. The bootstrap has been in use for more than 10 years. A number of theoretical results are known (see Bickel and Freedman, 1981; Singh, 1981), and the areas of application are growing. One area in which the research is ongoing is survey sampling. One difficulty of applying bootstrap techniques to sample surveys is that the sample size allocated to strata is often small, which brings into question the hope that in some sense the difference between and can be approximated by the difference between and . In addition, there is a scaling problem, which is not important for even moderate sample sizes but is very important when stratum sample sizes are less than five, which has been addressed by Rao and Wu (1988). For some simple sample designs, this procedure involves using the usual bootstrap algorithm with replicate samples of size one less than the original sample size. However, for most commonly used sample designs, the method suggested is more involved. Also, complications brought about by the sample design, such as cluster sampling, make the precise form of resampling that should be used unclear. Aspects of this problem have also been investigated by Rao and Wu (1988). For a two-stage cluster sample, they recommend a two-stage resampling procedure: (1) select a simple random sample of n clusters with replacement from the n sample clusters; (2) draw a simple random sample (independently subsampling for the same cluster chosen more than once) of mi elements with replacement from the mi elements in the ith sample cluster if the latter is chosen. Rao and Wu then make use of the rescaling mentioned above. A technique related to the jackknife and the bootstrap, which is specifically designed for use in sample surveys, especially when each stratum sample size is equal to two, is the method of balanced half-sample replication. This idea, developed by McCarthy (1969), assumes that the sample can be expressed in the form where each observation (or group of observations) has a mirror image, that is, every observation is linked to another that comes from the same distribution. This situation obtains in sample surveys in which each stratum has a sample

Next: MODELS AND VARIABILITY »

Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers (1991)

Chapter: NONPARAMETRIC VARIANCE ESTIMATION

Welcome to OpenBook!

Get Email Updates