Cover Image

PAPERBACK
$93.75



View/Hide Left Panel

Model Validation



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers Model Validation

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers This page intentionally left blank.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers 6 Variance Estimation of Microsimulation Models Through Sample Reuse Michael L.Cohen Microsimulation models play an important role in informing policy makers by providing timely information about the likely changes in allocations resulting from modifications to family assistance programs, tax regulations, etc., and also by identifying the characteristics of the people who are affected and in what ways they will be affected by those modifications. Currently, the results of microsimulation models are presented without any estimate of variability, aside from the variability that results from use of different policy scenarios. This lack of information about the variability in the models limits the utility of the output. For example, it is reasonable to expect a policy maker to act differently if informed that a change in program regulations for Aid to Families with Dependent Children (AFDC) would cost between $1.1 and $1.2 billion than if informed that the change would cost between −$2 and +$4 billion. The likely immediate result of the second range would be either the search for alternative estimates with less variability, if they were available, or the increased importance of policy objectives other than total cost. A possible long-range result might be the allocation of more resources to reduce the variability of future cost estimates. The crucial point is that an estimate without an accompanying assessment of its precision is difficult to use when it is one of several ingredients in a decision. The lack of a variance estimate for microsimulation models is not surprising Michael L.Cohen is assistant professor in the School of Public Affairs at the University of Maryland; he served as a consultant to the Panel to Evaluate Microsimulation Models for Social Welfare Programs.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers given the complexity of these models. However, recent advances in nonparametric variance estimation, including the development of Efron’s bootstrap, and recent increases in computational power, have made possible the estimation of the variance of microsimulation model projections. Expected advances both in computational power and in further understanding of the application of the bootstrap and other techniques to variance estimation for complex models will facilitate the calculation of these estimates. Before discussing how the bootstrap and other variance estimation techniques might be applied to microsimulation models, this chapter presents introductions to both computations of microsimulation models and nonparametric variance estimation. COMPUTATIONS OF MICROSIMULATION MODELS Microsimulation models use as primary input major surveys, such as the Current Population Survey (CPS), or samples from administrative records, such as samples from federal individual income tax filings. For example, assume that Congress is considering a major change in AFDC. An obvious question is how much this change will cost. An estimate is derived by first determining how many of the individuals in the CPS sample would be eligible for AFDC, how many of those would choose to participate, and how much assistance the eligible participants would receive. Next, to estimate the cost of the proposed changes to AFDC, the same steps would be taken with eligibility and benefit calculations performed for the proposed program. Finally, since the CPS is a national sample, the differences for each sampled family would be weighted to estimate the difference nationally, as well as the effects on smaller demographic groups. However, this simple idea is complicated by several factors. First, the covariate data in the available survey sample are often not rich enough to determine eligibility, participation, or amount of assistance received. Therefore, the information for each record is often augmented, either by imputation, by exact matching to administrative records, or through the use of a statistical match. (For a definition and discussion of statistical matching, see Rodgers [1984]; see also Cohen [Ch. 2 in this volume].) Second, the data are in many cases collected annually, while the programs of interest are administered monthly, so it is necessary to allocate income and other data to the households monthly. Third, most policy makers are interested in the effects due to a program change for several years into the future, not just the upcoming year. This is especially the case for attempts to use microsimulation modeling to estimate the costs of health insurance programs and the costs of modifications to social security and pension plans. In addition, by the time a data set is made available and the various adjustments to the original data set have been completed, the data are often 2 or 3 years old. These two factors have contributed to the development of aging procedures that either reweight the data to take into consideration external estimates of the expected demographic,

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers economic, and other shifts in the population, referred to as static aging, or take each individual record probabilistically through year-to-year life changes, referred to as dynamic aging. Both methods are attempts to create a sample population that represents the expected properties of populations in future years. Finally, after the aging procedure is completed, the entire file is often reweighted to bring certain totals of the data file close to control totals collected from a variety of sources, such as the number of eligible people who apply for funds from a program. This reweighting is one way of correcting for the misrepresentation of various groups in the original data set. The controlling to “accepted” marginal information is done both formally and somewhat informally, depending on how far the observed margins are from the “accepted” margins and depending on which marginal information is involved. NONPARAMETRIC VARIANCE ESTIMATION Until 25 years ago, variance estimation was limited in application to simple estimators, and, depending on the estimator, sometimes only for data obeying a distribution from a limited family of distributions. The theory giving rise to the estimation of the variance of a mean is simple and requires little knowledge of the underlying distribution. Similarly, the derivation leading to the estimation of the variance of a regression coefficient is relatively straightforward. However, the derivation of the estimation of the variance of a correlation coefficient is much more difficult; the answer is known for only a few bivariate distributions, including bivariate normality, for which the needed integration is rather involved. Thus, it is not surprising that estimating the variance of a model that involves a complicated series of computations—including imputations and statistical matchings; many regression and regression-type models, including use of logit models to estimate participation; controlling margins to accepted control totals; use of complicated aging techniques; and several other possible computational features—is not possible using standard methods. Calculus can only get one so far. Recently, however, several methods have been developed that provide the opportunity to estimate what was previously not possible. These methods include balanced half-sample replication, the jackknife, the infinitesimal jackknife, and the bootstrap; they are also referred to as sample reuse methods. Nonparametric variance estimators trace their beginning to Quenouille (1949), who introduced the jackknife. Quenouille was interested in the bias reduction properties of the jackknife. Tukey (1958) was the first who suggested using the jackknife for variance estimation. To introduce some notation, let denote an estimator that is a function of n data points, x1,…, xn. Denote the estimator computed for the sample with the ith observation removed, that is, . These are sometimes referred to as leave-one-out estimates. One can calculate n different estimates, by

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers removing each of the n sample points, in turn, from the calculation. The mean of these n leave-one-out estimates is denoted . Tukey noticed that the jackknife estimate, could be expressed as the average of n terms, the ith term being equal to . This was called the ith pseudovalue. In the special case where the leave-one-out mean is also and the ith pseudovalue is simply xi, an original data element. Using the implied analogy, it was hoped that the pseudovalues in general would be nearly independent and identically distributed (as are the original data values xi), so that the variance of would be well estimated by which in the case of the sample mean is the usual variance estimate. The analogy has not turned out to be as complete as had been hoped, because the ratio of (minus its expectation) to the square root of the estimated variance does not usually have a t distribution with n−1 degrees of freedom, even asymptotically. However, the variance estimator has been shown by Miller (1964, 1974a, 1974b) and others to be trustworthy for a variety of estimators and distributional families. The estimators that have been studied include means, variances, generalized means, and U-statistics. Essentially, the jackknife works for those estimators that are nearly linear and are not too discrete. For example, the jackknife has been shown to be less effective in estimating the variance of robust estimators, which are fairly nonlinear, and the jackknife has been shown to be inappropriate for estimating the variance of the median, which is too discrete an estimator, with the result that the leave-one-out estimates do not take on enough distinct values. However, the jackknife has been shown to be effective for unbalanced situations, such as regression, as well as for censored data. Finally, there is the grouped jackknife, suggested for computational ease, which at each step leaves some random or representative subset of the sample out of the computation, for example, using leave-k-out estimates for various values of k. Jaeckel (1972) helped provide a theoretical basis for the jackknife by linking it to influence functions. (For a reference to influence functions, see Hampel [1974]). An influence function (or curve) indicates how sensitive a statistic is to changes in the underlying distribution, denoted F, generating the data on which the estimator is used. It can be shown that the integral of the square of the influence function can be used to estimate the asymptotic variance of an estimator. The jackknife can be expressed as the sum of the square of an

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers empirical influence function, since it can be expressed as the sum of squares of terms involving which measure the change in the estimator due to a small change in the data set—the exclusion of one data point (a change to the empirical distribution function). Therefore it is not surprising that the jackknife is effective at estimating variances. Jaeckel developed the following estimator. Consider as a function of the n data points x1,…, xn and of the n weights w1,…, wn, which are nonnegative (but do not necessarily sum to 1). If all of the wi are equal to 1/n, then is the usual estimate, which can be written where is the empirical distribution function. Jaeckel realized that omitting an observation is the same as giving it a weight of 0. This fact raises the possibility of investigating the effect on an estimator of other changes to the weights, including the gradual removal of an observation from the sample. Jaeckel defined equal to . He then expanded in a Taylor series in e. An estimate of the variance of is a simple function of the sum of squares of the limit as e approaches 0 of the first-order terms of each Taylor series for each i. These are also empirical influence functions. If e is set equal to 1/n, the above estimator is equal to the jackknife estimate of variance. The resulting variance estimate, called the infinitesimal jackknife, is rarely proposed as an alternative to the jackknife for variance estimation, probably because of the strong assumptions about smoothness needed by the above limits, especially for small samples. However, the idea of considering a statistic as a function of the weights assigned to the observations likely gave rise to the current variance estimate of choice, the bootstrap. The reasoning behind Efron’s (1979) bootstrap is as follows. The variance is a function of the average squared distance between and the true value —the estimator evaluated using the true distribution F (which can be considered the limit of where Fn represents a series of empirical distribution functions with limit F). By using the assumption that the xi are independent and identically distributed, the empirical distribution function, is a sufficient statistic for the data set, and so one can concentrate on the average squared distance between and . Unfortunately, F is unknown, so one cannot use simulation to estimate the above quantity. However, as the sample size grows, will grow closer to F. It is possible that a good estimate of the distance between and even for small samples, is the distance between and where represents the empirical distribution derived from sampling from . (Another way to say this is that the observed relative frequencies of different estimates obtained by evaluating the estimator on pseudosamples drawn from the observed population are used to approximate the unknown relative frequencies obtained by sampling from the

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers entire population and evaluating the estimator for those samples.) This idea has been shown to work for a wide variety of problems.1 Computationally, the bootstrap estimate is developed as follows. One collects a sample of size n, x1,…, xn. Assume first that the x’s are independent and identically distributed. Then perform the following two steps K times, where K is, if possible, a large number: sample with replacement from x1,…, xn, collecting a pseudosample of size n, denoted for the kth pseudosample, compute . Then estimate the variance of with the observed variance of the that is, compute: where is the average of the over k. The bootstrap sample for any given replication will very likely not include some of the original data, will have other data included once, and may have other data included twice, three times, or more. Some analysts find this aspect of the bootstrap troubling. For this reason, it is probably best to consider the bootstrapping process as a method for computing the average squared difference between and or what is essentially the same thing, the variance of assuming to be the underlying distribution. Thus the bootrapping process is just a convenient way to compute, approximately, an expectation with respect to a discrete distribution. Such a process is undeniably computationally expensive. It is clear that these ideas would not have been feasible until about 10 to 15 years ago when computation became quicker and cheaper. Once the fundamental bootstrapping idea is understood, it seems like a valuable, straightforward technique that could be applied to a variety of problems. The bootstrap, as opposed to the jackknife, does work for the median and many other nonlinear estimators, and for many estimators the bootstrap is more efficient than the jackknife. However, there are a number of possible 1   Another way to say this is that one wants an estimate of where Fn is the distribution function for a sample of n x’s. It can be approximated by where is the distribution function which generates x*’s obtained by sampling from replacement from x1,…, xn. This is accomplished by substituting for F in the equation above. Therefore, the bootstrap uses as an estimate of F in the definition of the variance of , which is asymptotically justifed since is a consistent estimator for F.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers complications, two of which occur in regression problems. (See Efron [1986] for a more complete discussion of bootstrapping regression problems.) First, consider the estimation of the variance of a regression coefficient (for which one does not need the bootstrap since the answer is known). The naive approach to the use of the bootstrap is to treat the rows of data {yi, x1, x2i,…, xpi} as independent and identically distributed random vectors from a multivariate distribution, resample with replacement from this data set as indicated above, and calculate the regression coefficient for each pseudosample. Then the bootstrap variance estimate is simply the sample variance of these regression coefficients. However, this multivariate assumption is often not typical of regression problems. One often wants to consider the independent variables as fixed, not as representative of some larger population. In that case, there is no longer a “balanced” situation since the xi’s for each case are not identically distributed; they are not even random variables. For example, one may have purposely chosen the independent variables to be spread out for purposes of better estimation. Therefore, one needs to identify independent and identically distributed random variables within the problem so that a derived empirical distribution function estimates some F that makes sense. The random component of the typical regression model is the error term. Thus, what is usually done is to regress y on the x’s, determine the n residuals, and then consider the empirical distribution function of the residuals. One then repeatedly (1) creates pseudosamples of residuals by resampling with replacement from the original sample of n residuals, (2) reattaches them to the fitted values, and then (3) computes the regression coefficients. This is done K times. One can then estimate the variance of any individual regression coefficient by computing the variance of that coefficient over the bootstrap replications. A critic might point out that the residuals are hardly independent or identically distributed. It is true that the residuals are weakly dependent, but this dependence is probably safely ignored. More importantly, the residuals often have greatly different variances. This heterogeneity should be addressed if it is extreme by standardizing the residuals, using the so-called hat matrix before resampling and then reinstating the heterogeneity, depending on where each pseudoresidual is reattached. The second example is a further complication of this problem. Suppose that the errors are autocorrelated. Then the time-series model that is appropriate for the errors, as well as the regression model itself, needs to be estimated. The residuals from the time-series model representing (approximately) independent, identically distributed normal random variates are resampled and reattached to the time-series model to compute residuals from the regression. These residuals are then reattached to the regression model. The process is repeated several times and the variances are computed as above. The fundamental idea is that the resampling can only be performed on (approximately) independent and identically distributed units. (Actually, the most general statement is slightly weaker than this one, involving exchangeability.) The assumptions one makes

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers to arrive at independent and identically distributed units can result in a variety of “bootstraps.” Thus, this “automatic” procedure for variance estimation is not necessarily that automatic; a lot of modeling and decision making is involved. Efron (1979) showed that the infinitesimal jackknife variance estimate could be expressed as the bootstrap variance estimate for a linear version of the statistic of interest. Therefore, if a statistic is linear, the infinitesimal jackknife and the bootstrap estimates are essentially the same. And if a statistic is not linear, and one makes a close linear approximation to the statistic, the bootstrap variance estimate of the linearized statistic is essentially the same as the infinitesimal jackknife estimate of the variance of the original statistic. The bootstrap has been in use for more than 10 years. A number of theoretical results are known (see Bickel and Freedman, 1981; Singh, 1981), and the areas of application are growing. One area in which the research is ongoing is survey sampling. One difficulty of applying bootstrap techniques to sample surveys is that the sample size allocated to strata is often small, which brings into question the hope that in some sense the difference between and can be approximated by the difference between and . In addition, there is a scaling problem, which is not important for even moderate sample sizes but is very important when stratum sample sizes are less than five, which has been addressed by Rao and Wu (1988). For some simple sample designs, this procedure involves using the usual bootstrap algorithm with replicate samples of size one less than the original sample size. However, for most commonly used sample designs, the method suggested is more involved. Also, complications brought about by the sample design, such as cluster sampling, make the precise form of resampling that should be used unclear. Aspects of this problem have also been investigated by Rao and Wu (1988). For a two-stage cluster sample, they recommend a two-stage resampling procedure: (1) select a simple random sample of n clusters with replacement from the n sample clusters; (2) draw a simple random sample (independently subsampling for the same cluster chosen more than once) of mi elements with replacement from the mi elements in the ith sample cluster if the latter is chosen. Rao and Wu then make use of the rescaling mentioned above. A technique related to the jackknife and the bootstrap, which is specifically designed for use in sample surveys, especially when each stratum sample size is equal to two, is the method of balanced half-sample replication. This idea, developed by McCarthy (1969), assumes that the sample can be expressed in the form where each observation (or group of observations) has a mirror image, that is, every observation is linked to another that comes from the same distribution. This situation obtains in sample surveys in which each stratum has a sample

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers size of two. A half-sample is defined as a sample in which one of each of the mirror-image pairs is removed from the sample. Thus, any half-sample can be identified by an inclusion vector of +1’s and −1’s of length H, where a +1 in the ith position indicates that the left image of the ith pair is included, and a −1 indicates that the right image is included. There are 2H half-samples. One possible method for computing the variance is to compute the statistic for each of the half-samples and then calculate the variance of these 2H statistics in the usual way. While clearly this can be done, it is computationally very expensive. McCarthy (1969) showed that for linear statistics one need only recalculate the statistic for a balanced set of half-samples. A balanced set of half-sample replicates is a set in which the inner product of any two inclusion vectors is 0. Roughly, a balanced set avoids the possibility of any mirror image being in the computations “too often” or “not often enough.” Efron (1982) also mentions the possibility of complementary balanced half-samples, which extends the utility of McCarthy’s idea to quadratic statistics. Another technique that might be applicable in the sample survey context is presented by Efron (1982: Ch. 8). Again consider a sample in which each element has a natural mirror-image, denoted x11, x12, x21, x22,…, xh1, xh2,…, xH1, xH2. These elements could either be strata or strata halves. Let the value of the estimator computed for the data set, where xh2 is replaced with its mirror image, xh1. In other words, the weight for xh1 is doubled, and xh2 is removed from the computation. Let denote the same computation as except the roles of xh1 and xh2 are switched. Then Efron presents the following variance estimate: This variance estimate is relatively cheap computationally and also can be used on quadratic statistics. MODELS AND VARIABILITY Before continuing on to a discussion of how one might apply sample reuse techniques to the estimation of the variability of the output of a microsimulation model, it is important to briefly discuss philosophically what is meant by “model” and “variance.” In a complicated modeling effort, including microsimulation modeling, there is often no precise definition of the model. That is, there is no clear delineation between what parameters or modules define the model and are not variable, as opposed to what parameters or modules are variable and whose associated variances affect the results of the model. Another way of viewing this question is to ask to what extent the variance estimate is conditional on certain parameters or modules assumed to be fixed at

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers TABLE 3 Values of Δr for SSA Projections of Total U.S. Population Size for Base Years 1950 to 1986 (percent) Duration (years) Base Year 1950 1955 1965 1973 1977 1979 1981 1982 1983 1984 1985 1986 1   −0.09 0.02 −0.06 −0.07 −0.05 −0.03 −0.05 2   −0.06   −0.05 −0.05 −0.07 −0.05 −0.06   3   −0.16   −0.03 −0.05 −0.07 −0.06   4   −0.04 −0.05 −0.08   5   −0.25 0.11   −0.04 −0.06   6   −0.07 −0.05   7   −0.06   8   −0.15   10 −0.61   0.25   12   −0.05   15   −0.02 0.31   20 −0.56   25   0.15   30 −0.40   Average −0.52 −0.04 0.22 −0.05 −0.15 −0.08 −0.03 −0.05 −0.07 −0.05 −0.04 −0.05 NOTE: Δr is defined as where r is the average population growth rate (see text). SOURCE: Projections taken from Myers and Rasor (1952), Greville (1957), Bayo (1966), Bayo and McCay (1974), Bayo, Shiman, and Sobus (1978), Faber and Wilkin (1981), Wilkin (1983), and Wade (1984, 1985, 1986, 1987, 1988). Population estimates taken from data provided by Alice Wade, Social Security Administration, Office of the Actuary.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers FIGURE 5 Average errors in projected growth rates of U.S. population by Bureau of the Census and Social Security Administration projections. NOTE: See text for definition of Δr. SOURCE: Data from Tables 2 and 3. problem with these figures is that the degree of confidence with which they are held is unknown, and there is often considerable controversy surrounding their choice. Indeed, the Census Bureau and the SSA have often been criticized for assuming mortality declines that are too slow or fertility fluctuations that are too conservative (Bouvier, 1990; Crimmins, 1983, 1984; Olshansky, 1988; Ahlburg and Vaupel, 1990). Should users expect the true population outcome to fall within the high and low ranges 25 percent of the time? 75 percent? 99.9 percent? Moreover, because the choice of values for the input variables is somewhat subjective, it is impossible to assign each a specific probability. It would be helpful, however, to have a rough idea of the confidence level thought of by the forecaster.11 For example, in the most recent SSA projections, male life expectancy in 2080 ranges from 75.3 to 82.3 years. Are these values outer bounds of what is possible or rather plausible alternative scenarios, with more extreme ones not out of the question? Keyfitz (1981:590–591) sums up the difficulty as follows: Without some probability statement, high and low estimates are useless to indicate in what degree one can rely on the medium figure, or when one 11   Even if precise probability distributions were assigned to each input, one would still question how likely a particular combination of those inputs is. The Census Bureau’s high and low scenarios assume perfect correlation of high fertility, low mortality, and high immigration. In reality these inputs are not perfectly correlated.  

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers ought to use the low or the high. Nor do we derive any help from the notion that each of the projections corresponds to a different set of assumptions and that it is up to the user to consider the three sets of assumptions, decide which is the most realistic, and choose that one. If he actually goes to the trouble and has the skill to reflect on the alternative sets of assumptions and decide which is most realistic, then he might as well make the calculations in addition—that is a relatively easy matter once the assumptions are specified. If on the other hand, as more commonly happens, the user looks at the results and takes whichever of the three projections seems to him most likely, then the demographer has done nothing for him at all—the user who is required to choose on the basis of which of the results looks best might as well choose among a set of random numbers. The second way of considering the variability in future populations is through the development of complicated statistical models that account for variability in past vital rates (Heyde and Cohen, 1985; Cohen, 1986; Sykes, 1969; Lee, 1974; Lee and Carter, 1990; Saboia, 1974, 1977; Voss et al., 1981a; Alho, 1984, 1985, 1990; Alho and Spencer, 1985, 1990a, 1990b). This approach is typically of the sort that simply extrapolates the rates of the past into the future and accounts for how variable the rates have been. Similarly, Keyfitz (1989) uses the variability of historical vital rates, but works with simulations in place of statistical models. Alho (1985) has developed an approach along these lines that takes into account expert opinion as well. This “mixed” model takes a weighted average of expert predictions and information concerning past trends and their variability to develop confidence intervals about the future population. In one such model he determines that the Census Bureau’s high and low projections of births are well within a two-thirds confidence interval—that is, the actual number of births would be likely to fall between the high and the low estimates less than two-thirds of the time. For mortality, however, the high and the low estimates appear to match rather closely a 95 percent confidence interval for short-term projections, with confidence greater than 95 percent for projections longer than 5 or 10 years. A third approach is simply to evaluate the performance of past projections. With a sufficient number of projections, it is possible to view the distribution of errors by size and sign. Smith (1987) shows that the distribution of percent errors is approximately a normal bell-shaped function, implying that most errors cluster quite close to a mean value and become less frequent as one moves away from this mean. Values of Δr likewise appear to be approximately normally distributed. Many authors have taken advantage of this fact to make predictions about the percentage of future errors that will fall within a given range (Stoto and Schrier, 1982; Voss et al., 1981b; Smith, 1987; Stoto, 1983; Keyfitz, 1981; Long, 1987). It is generally true that an error will fall within a single standard deviation of the mean two-thirds of the time and will fall within two standard deviations of the mean 95 percent of the time. In this way it is sufficient to

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers calculate the standard deviation, typically by using the root mean square error, as defined earlier, to describe the confidence in a projection. This approach has been taken by several authors in considering projections of state or county populations. A few findings are noteworthy. First, projections tend to be much more accurate (lower Δr values) in areas where growth is moderate than where it is exceptionally slow or rapid (Smith, 1984). Accuracy is considerably better for large states than for small ones (Smith, 1987). Keyfitz (1981) observed that the root mean square error for states is more than double that of the whole country, most likely because interstate migration is much more variable than international migration. After making statistical adjustments for the overall biases in each projection year, Stoto (1983) puts the standard deviation of Δr for the Census Bureau’s projections through 1970 at 0.52 percentage point. Because there are so few years on which this estimate is based, he believes that a better estimate of the standard deviation can be obtained from U.N. projections for developed regions. Using these, the standard deviation is calculated as 0.28. Thus, on the basis of past projection performance, projected future growth rates can be expected to be within plus or minus 0.3 percentage point of the actual growth rates two-thirds of the time.12 Stoto points out that this range of growth rates matches fairly closely the Census Bureau’s (1977) high and low projections. One can be 95 percent confident that the true growth rate will fall within plus or minus 0.6 percentage point of the projected rate. After examining the United Nations projections for developed and developing regions, Keyfitz (1981) concluded that the high and low projections are rather close to a two-thirds confidence interval.13 The approach taken by Stoto and Keyfitz assumes that the distribution of errors in the future will mirror that of the past. Smith and Sincich (1988) provide support for this assumption in their analysis of simple projections of county populations, but Beaumont and Isserman (1987) take issue with their methodology. Beaumont and Isserman’s (1987:1007) evaluation of state population projections leads to the opposite conclusion, and they believe the following: 12   One might object that the United Nations projections are not the same as those of the Census Bureau and indeed do not even project the same populations. However, if the primary source of variability in r values stems from the inherent volatility of population growth rather than from peculiarities of the projecting agency, use of data from other developed countries may be quite instructive. 13   The general congruence of two-thirds confidence intervals with the Census Bureau’s high-low projections apparently holds only with regard to total population size. Alho and Spencer (1985) point out that, when the population is disaggregated into age groups, different patterns can be seen. They demonstrate that for surviving ages (in which persons are already born at the time of the projection) the high-low estimates are close to 90 percent confidence levels. However, part of the error they consider incorporates uncertainty in the baseline population, so confidence regarding the mortality assumptions may in fact be higher than 90 percent. They find that the high-low interval for births is considerably narrower than a two-thirds confidence interval.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers The results indicate that the error distributions for the state data vary considerably according to historical context, method, and base period. There does not appear to be a predictable pattern as to when the error distribution for one period might serve as an accurate guide to the error distribution in some future period. It has been noted (see Figure 4) that errors in the Census Bureau’s projections have declined over time, most likely because of the greater stability of fertility rates more recently. Following an examination of United Nations projections made between 1958 and 1968, Keyfitz (1981) also observed a reduction in the magnitude of errors over time. In addition to the reasons given by Long (1987) for this decline, Keyfitz suggests that data on population levels and rates of increase have improved. He argues that this trend can be viewed in any of three ways: (1) presume it will continue, so that future projections can be expected to be better than past ones; (2) take the most recently observed variability as correct; or (3) presume that the future will be as variable as the past with apparent trends representing only random variations. A strong case can be made for placing confidence intervals around population projections. Users need to know how likely alternative scenarios of growth may be. Keyfitz (1972) offers as an example a town that is about to embark on building a reservoir. Overestimating the size of the future population incurs a small cost in having built the reservoir too large, but underestimating it entails construction of a costly additional reservoir. In practice the planner takes the largest projection, but this may involve extremely inefficient use of funds. A planner who knows the distribution of possible population sizes could choose the population estimate that minimizes expected losses on a project and plan accordingly. As Keyfitz (1972:360) concludes: The demographer should be encouraged to [give] subjective probabilities by the thought that society bets on future population whenever it builds a school, a factory, or a road. Real wagers, running to billions of dollars, are implicit in each year’s capital investment. Someone ought to be willing to make imaginary wagers if so doing will more precisely describe the distribution of future population and so improve investment performance even by a minute fraction. Despite the need for confidence intervals and probability distributions associated with population projections, many problems remain with the methods considered here. (Many methods now exist for estimating the uncertainty surrounding population projections, but seldom has one method been tested against others; see Lee [1974] and Cohen [1986] for exceptions.) As noted in the latest Census Bureau projections: “There is considerable controversy over the means of handling improvements in methods, changing variability in population growth rates, and other complicating factors” (Spencer, 1989:14). The bureau presents bounds using a “reasonable high” and a “reasonable low.”

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers It would be useful to have a more explicit definition of “reasonable” so that users can determine the degree of confidence to place in the range of estimates. CONCLUSIONS In Chapter 3 of Volume I, several recommendations were advanced for policy agencies that supply projections about future events. How do official U.S. population projections measure up in light of these recommendations? The first recommendation (3–9) relates to the need to prepare information about the levels and sources of uncertainty in the projections. Those who make population projections do a good job of reporting on the sources of error (baseline population, fertility, mortality, and migration), but they fail to make explicit the level of uncertainty surrounding each input component. Recommendation 3–11 concerns the need for ongoing error analysis and validation tests. The Census Bureau routinely reports on the accuracy of past projections as a guide to users of its published figures. The bureau has also carried out more comprehensive analyses (see Long [1987], for example). The SSA, to our knowledge, has not conducted any external validations of its projections. A third recommendation (3–14) suggests that in presenting results to decision makers, agencies should report estimates of uncertainty regarding model output. Providing high, middle, and low alternatives is helpful in this regard, but as already noted there is a need to inform users of the level of uncertainty regarding the alternatives. In two other recommendations (3–12 and 3–13), Volume I argues for the need to document the model’s analytical structure and make the models comprehensible and reproducible. The Census Bureau and the SSA do a good job of describing the techniques used to make the projections. The projections can easily be reestimated with different assumptions at some later time. However, there is a subjective component to the choice of inputs that cannot be reproduced in future analyses. This chapter describes the basic methodology for making cohort component population projections. This methodology, which is used by the Census Bureau and the SSA, requires four inputs: a baseline population and assumptions regarding future fertility, mortality, and migration. We illustrate how several aspects of the future population depend on these assumptions and conclude that projections of the population under age 5 are affected most by the fertility assumptions, whereas the population over age 75 is affected most by the mortality assumptions, at least for the first 75 years beyond the base year. Using the median projections made by the Census Bureau and the SSA, we compared projections of total population size with actual population outcomes. The success of past performance has been determined largely by the time at which the projections were made. Projections prior to the mid-1950s tended to be too low; projections from the mid-1950s to around 1970 were too high; and projections since 1970 have come quite close to actual outcomes, albeit

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers with a slight tendency to underestimate growth rates. Finally, we examined several methods that have been used to place confidence limits about population projections. The validity of these error bounds depends on whether fluctuations in vital rates are becoming more stable over time. Several studies that assume stability have concluded that the Census Bureau’s high and low projections have typically spanned a two-thirds confidence interval. On this basis we might expect the future population to fall between the high and low projections roughly two-thirds of the time. REFERENCES Ahlburg, D.A. 1990 The Ex-Ante Forcasting Performance of an Economic-Demographic Forecasting Model and the Bureau of the Census Projections of Total Live Births. Paper presented at annual meeting of the Population Association of America, Toronto. May 3–5. Ahlburg, D.A., and Vaupel, J.W. 1990 Alternative Projections of the U.S. Population. Working Paper 90–06–2. Center for Population Analysis and Policy. Minneapolis: University of Minnesota. Alho, J.M. 1984 Probabilistic forecasts: The case of population projections. Scandinavian Housing and Planning Research 1:99–105. 1985 Interval estimates for future population. Pp. 44–51 in 1985 Proceedings of the Social Statistics Section. Washington, D.C: American Statistical Association. 1990 Stochastic methods in population forecasting. International Journal of Forecasting 6(4):521–530. Alho, J.M., and Spencer, B.D. 1985 Uncertain population forecasting. Journal of the American Statistical Association 80(390):306–314. 1990a Effects of targets and aggregation on the propagation of error in mortality forecasts. Mathematical Population Studies 2(3):209–227. 1990b Error models for official mortality forecasts. Journal of the American Statistical Association 85(411):609–616. Ascher, W. 1978 Forecasting: An Appraisal for Policy-Makers and Planners. Baltimore: Johns Hopkins University Press. Bayo, F.R. 1966 United States Population Projections for OASDHI Cost Estimates. Actuarial Study. No. 62. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health, Education, and Welfare. Bayo, F.R., and McCay, S.F. 1974 United States Population Projections for OASDHI Cost Estimates. Actuarial Study. No. 72. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health, Education, and Welfare.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers Bayo, F.R., Shiman, H.W., and Sobus, B.R. 1978 United States Population Projections for OASDHI Cost Estimates. Actuarial Study. No. 77. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health, Education, and Welfare. Beaumont, P.M., and Isserman, A.M. 1987 Comment on tests of forecast accuracy and bias for county population projections. Journal of the American Statistical Association 82(400):1004–1009. Bouvier, L.F. 1990 U.S. population in the 21st century: Which scenario is reasonable? Population and Environment 11(3):193–202. Bureau of the Census 1953 Illustrative Projections of the Population of the United States, by Age and Sex: 1955 to 1975. Current Population Reports. Series P-25, No. 78. Washington, D.C.: U.S. Department of Commerce. 1955 Revised Projections of the Population of the United States, by Age and Sex: 1960 to 1975. Current Population Reports. Series P-25, No. 123. Bureau of the Census. Washington, D.C.: U.S. Department of Commerce. 1962 Interim Revised Projections of the Population of the United States, by Age and Sex: 1975 to 1980. Current Population Reports. Series P-25, No. 251. Washington, D.C.: U.S. Department of Commerce. 1966 Revised Projections of the Population of the United States, by Age and Sex to 1985. Current Population Reports. Series P-25, No. 329. Washington, D.C.: U.S. Department of Commerce. 1971 Projections of the Population of the United States, by Age and Sex: 1970 to 2020. Current Population Reports. Series P-25, No. 470. Washington, D.C.: U.S. Department of Commerce. 1977 Projections of the Population of the United States: 1977 to 2050. Current Population Reports. Series P-25, No. 704. Washington, D.C.: U.S. Department of Commerce. 1990 Statistical Abstract of the United States: 1990. 110th Edition. Washington, D.C.: U.S. Department of Commerce. Cohen, J.E. 1986 Population forecasts and confidence intervals for Sweden: A comparison of model-based and empirical approaches. Demography 23(1):105–126. Crimmins, E.M. 1983 Implications of recent mortality trends for the size and composition of the population over 65. Review of Public Data Use 11:37–48. 1984 Life expectancy and the older population: Demographic implications of recent and prospective trends in old age mortality. Research on Aging 6(4):490–514. Faber, J.F., and Wilkin, J.C. 1981 Social Security Area Population Projections 1981. Actuarial Study. No. 85. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health and Human Services. Greenberg, M. 1972 A test of combinations of models for projecting the population of minor civil divisions. Economic Geography 48(2):179–188. Greville, T.N.E. 1957 Illustrative United States Population Projections. Actuarial Study. No. 46.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers Division of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health, Education, and Welfare. Hajnal, J. 1955 The prospects for population forecasts. Journal of the American Statistical Association 50(270):309–322. Heyde, C.C., and Cohen, J.E. 1985 Confidence intervals for demographic projections based on products of random matrices. Theoretical Population Biology 27(2):120–153. Keyfitz, N. 1972 On future population. Journal of the American Statistical Association 67(338):347–362. 1981 The limits of population forecasting. Population and Development Review 7(4):579–594. 1982 Can knowledge improve forecasts? Population and Development Review 8(4):729–747. 1989 Measuring in Advance the Accuracy of Population Forecasts. Unpublished paper. International Institute for Applied Systems Analysis, Laxenburgh, Austria. Lee, R.D. 1974 Forecasting births in post-transition populations: Stochastic renewal with serially correlated fertility. Journal of the American Statistical Association 69(347):607–617. Lee, R., and Carter, L. 1990 Modeling and Forecasting the Time Series of U.S. Mortality. Paper presented at annual meeting of Population Association of America, Toronto. May 3–5. Long, J.F. 1987 The Accuracy of Population Projection Methods at the Census Bureau. Paper presented at annual meeting of Population Association of America, Chicago. April 30–May 2. Myers, R.J., and Rasor, E.A. 1952 Illustrative United States Population Projections, 1952. Actuarial Study. No. 33. Division of the Actuary, Federal Security Agency. Washington, D.C.: Social Security Administration. Olshansky, S.J. 1988 On forecasting mortality. The Milbank Quarterly 66(3):482–530. Saboia J.L.M. 1974 Modeling and forecasting populations by time series: The Swedish case. Demography 11(3):483–492. 1977 Auto-regressive integrated moving average (ARIMA) models for birth forecasting. Journal of the American Statistical Association 72(358):264–270. Siegel, J.S. 1972 Development and accuracy of projections of population and households in the United States. Demography 9(1):51–68. Smith, S.K. 1984 Population Projections: What Do We Really Know? Bureau of Economic and Business Research. Gainesville: University of Florida. 1987 Tests of forecast accuracy and bias for county population projections. Journal of the American Statistical Association 82(400):991–1003. Smith, S.K., and Sincich, T. 1988 Stability over time in the distribution of population forecast errors. Demography 25(3):461–474.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers Spencer, G. 1984 Projections of the Population of the United States, by Age, Sex, and Race: 1983 to 2080. Current Population Reports. Series P-25, No. 924. Bureau of the Census. Washington, D.C.: U.S. Department of Commerce. 1989 Projections of the Population of the United States, by Age, Sex, and Race: 1988 to 2080. Current Population Reports. Series P-25, No. 1018. Bureau of the Census. Washington, D.C.: U.S. Department of Commerce. Stoto, M.A. 1983 The accuracy of population projections. Journal of the American Statistical Association 78(381):13–20. Stoto, M.A., and Schrier, A.P. 1982 The accuracy of state population projections. Pp. 276–281 in 1982 Proceedings of the Social Statistics Section. Washington, D.C.: American Statistical Association. Sykes, Z.M. 1969 Some stochastic versions of the matrix model for population dynamics. Journal of the American Statistical Association 44:111–130. United Nations 1989 World Population Prospects, 1988. New York: United Nations, Department of International Economic and Social Affairs. Voss, P.R., Palit, C.D., Kale, B.D., and Krebs, B.C. 1981a Forecasting State Populations Using AR1MA Time Series Techniques. Applied Population Laboratory, Department of Rural Sociology. Madison: University of Wisconsin. 1981b An analysis of error structure. Pp. 436–441 in 1981 Proceedings of the Social Statistics Section. Washington, D.C.: American Statistical Association. Wade, A. 1984 Social Security Area Population Projections 1984. Actuarial Study. No. 92. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health and Human Services. 1985 Social Security Area Population Projections 1985. Actuarial Study. No. 95. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health and Human Services. 1986 Social Security Area Population Projections 1986. Actuarial Study. No. 97. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health and Human Services. 1987 Social Security Area Population Projections 1987. Actuarial Study. No. 99. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health and Human Services. 1988 Social Security Area Population Projections 1988. Actuarial Study. No. 102. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health and Human Services. Wilkin, J.C. 1983 Social Security Area Population Projections 1983. Actuarial Study. No 88. Office of the Actuary, Social Security Administration. Washington, D.C.: U.S. Department of Health and Human Services.

OCR for page 235
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers This page intentionally left blank.