APPENDIX C
Census Bureau's Methodology for Model-Based Estimates

The Census Bureau's estimation methodology for producing county estimates of the number and percentage of related children aged 5–17 in poverty (poor school-age children) can be separated into four distinct steps: (1) the production of county estimates of the number of poor school-age children; (2) the production of state estimates of the number of poor school-age children; (3) the modification of the county estimates so that they add to the state estimates; and (4) the use of the estimated number of related children aged 5–17 as a denominator to produce estimates of the percentage of those children in poverty. Steps 1, 2, and 3 are described below; Appendix D describes the development of the denominators used in step 4.

Two time periods must be differentiated in this discussion. The March 1994 CPS supports model-based estimates of the numbers of school-age children who lived in each county in 1994 and were in poverty in 1993 (the reference year for the income questions). Estimates that refer to the March 1994 CPS (or 1994 and surrounding years) are therefore referred to as 1993 estimates, although strictly speaking they involve information for both 1993 and 1994. Similarly, the 1990 decennial census produced estimates of school-age children who lived in each county in 1990 and were in poverty in 1989. The March 1990 CPS (or 1990 and surrounding years) supports estimates that are for the same income reference year (1989) as the census; we refer to these estimates as the 1989 estimates. The 1993 estimates are the current objective of the small-area estimation program; the 1989 estimates are important for evaluation purposes because they can be compared to the census. This appendix considers both the 1993 and 1989 estimates.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 63
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations APPENDIX C Census Bureau's Methodology for Model-Based Estimates The Census Bureau's estimation methodology for producing county estimates of the number and percentage of related children aged 5–17 in poverty (poor school-age children) can be separated into four distinct steps: (1) the production of county estimates of the number of poor school-age children; (2) the production of state estimates of the number of poor school-age children; (3) the modification of the county estimates so that they add to the state estimates; and (4) the use of the estimated number of related children aged 5–17 as a denominator to produce estimates of the percentage of those children in poverty. Steps 1, 2, and 3 are described below; Appendix D describes the development of the denominators used in step 4. Two time periods must be differentiated in this discussion. The March 1994 CPS supports model-based estimates of the numbers of school-age children who lived in each county in 1994 and were in poverty in 1993 (the reference year for the income questions). Estimates that refer to the March 1994 CPS (or 1994 and surrounding years) are therefore referred to as 1993 estimates, although strictly speaking they involve information for both 1993 and 1994. Similarly, the 1990 decennial census produced estimates of school-age children who lived in each county in 1990 and were in poverty in 1989. The March 1990 CPS (or 1990 and surrounding years) supports estimates that are for the same income reference year (1989) as the census; we refer to these estimates as the 1989 estimates. The 1993 estimates are the current objective of the small-area estimation program; the 1989 estimates are important for evaluation purposes because they can be compared to the census. This appendix considers both the 1993 and 1989 estimates.

OCR for page 63
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations COUNTY-LEVEL ESTIMATION1 The county-level model uses regression to produce the estimates, with (3-year average) CPS measures as the dependent variable and administrative data and population estimates for the independent variables. In this model:: yi = a + ß1x1i + ß2x2i + ß3x3i + ß4x4i + ß5x5i + ui +ei , where: yi = log(3-year weighted average of the number of poor school-age children in county i),2 x1i = log(number of child exemptions [assumed to be under age 21] reported by families in poverty on tax returns in county i), x2i = log(number of people receiving food stamps in county i), x3i = log(estimated noninstitutionalized population under age 21 in county i),3 x4i = log(number of child exemptions on tax returns in county i), x5i = log(number of poor school-age children in county i in the previous census), u = model error for county i, and ei = sampling error for county i. Variables are transformed using logarithms for two reasons. First, it is more plausible that the model is homoscedastic on the log scale (corresponding to a constant coefficient of variation or equal model variances of share in poverty) than on the original scale (equal model variances of number in poverty) over the extremely wide range of county sizes. Second, the transformed variables have a much more symmetric distribution, and the scatterplots of various covariates with the dependent variable are more linear. Only CPS sample counties that have some poor school-age children in at least one of the 3 years contributing to the 3-year average are used in the regres- 1   The following section draws heavily from the Census Bureau's documentation (Coder et al., 1996). 2   The estimated number of poor school-age children is the product of the weighted 3-year average CPS county poverty rate for related children aged 5–17 and the weighted 3-year average CPS county number of related children aged 5–17. The weights for this average are the fractions of the 3-year total of CPS interviewed housing units containing children aged 5–17 in each year. For estimates from a given year, stratum-level weights ordinarily used have been removed. These stratum-level weights result from an over-or undersampling of counties to account for certain demographic or other characteristics. As a result, for this analysis, counties receive a weight depending directly on their population size and not on other characteristics. 3   For the 1989 model, estimates of this variable are from the 1990 census; for the 1993 model, estimates are from the Census Bureau's population estimates program.

OCR for page 63
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations sion. For the 1989 model, 1,028 of 3,141 counties were included in the regression; for the 1993 model, 1,184 of 3,143 counties were included (see Coder et al., 1996:Table 3). As represented above, the variability of yi, after the effects of the predictor variables are accounted for, is due to model error and sampling error. Since the sum of these vary substantially among counties, resulting in heterogeneous variances, a weighted least-squares regression is used. The weights are developed as follows. A mean square error is computed from the unweighted regression of log(1990 census estimates of the number of poor school-age children in 1989), using the covariates appropriate for an estimate of the dependent variable for 1989 (e.g., x5i would pertain to the 1980 census) and including only counties that have sample households with poor school-age children in the March 1993, 1994, or 1995 CPS. The mean square error or variance of total error for this regression is the sum of sampling variance and variance due to model error, that is, var(ui) + var(ei). The variance due to model error for this regression can be estimated by subtracting the contribution to mean square error due to the (estimated) sampling variances of log(census poverty estimates) that are derived from published generalized variance function estimates for each county. Since the census sampling variances are relatively small, variance due to model error is about 88 percent of the mean square error in the census regression model. This estimated model variance is assumed to closely approximate the model variance for an (unweighted) regression with the dependent variable of log(3-year average CPS estimates of the number of poor school-age children). Therefore, when this estimate of model error is subtracted from the mean square error for the CPS regression, the remainder is an estimate of the total county-level CPS sampling variability. The individual county-level sampling variance (for the log dependent variable) is then estimated by assuming that it is inversely proportional to sample size. To obtain the individual county-level contribution from model error, the model error is assumed to be homogeneous (i.e., the variance of the model error is assumed to be equal for each county). The mean square error for county i is then the sum of the variance due to model error and the estimated sampling variance (which depends on the county sample size). Most of the CPS mean square error (about 90 percent) is derived from sampling variance. The reciprocals of the mean square errors are then used as weights to recompute the regression using weighted least squares, which provides new weights since the mean square error has changed. Only one iteration is performed. The weights for the 1989 and 1993 CPS regressions differ because of their different data sets and because each year's model uses the counties in the CPS sample for that year. Together, these differences cause the estimated sampling variances to differ. However, the procedure used to develop the weights for the 1989 and 1993 CPS regressions assumes that the CPS regressions have the same model error as the 1989 census regression. Implicit in this assumption are the

OCR for page 63
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations assumptions that the CPS and census regression models are very similar and that the time from the last census (the 1980 census for the 1989 model and the 1990 census for the 1993 model) is not an important source of differences in mean square error for these models. These assumptions have not been fully validated. For the counties that do not appear in the 3-year CPS sample, estimates of log(number of poor school-age children) are calculated by substituting the covariates for that county in the estimated regression model and computing the model prediction. For the 1,028 (1989 model) or 1,184 (1993 model) counties for which direct CPS estimates are available, the direct 3-year average CPS estimates and the model predictions are combined, using a weighted average (referred to as empirical Bayes or shrinkage estimation) in which the weight for the model prediction is the ratio of the estimated sampling variance to the sum of the estimated sampling variance and the model error variance for that county. It is important to note that for almost all counties, the great majority of the weight is given to the model predictions; for only 13 counties is the weight for the model prediction less than 0.5. The numbers of poor related children aged 5–17 in each county estimated from the county-level model are then controlled to the state poverty estimates. STATE-LEVEL ESTIMATION4 For most states, direct estimates of the number of poor school-age children from the March CPS are insufficiently reliable to be used alone. A model-based approach that borrows strength from administrative records (IRS tax files, food stamp files, etc.), the decennial census, and other states is therefore used. The methodology for development of the 1989 state estimates is described below; similar methods were used for 1993 estimates, following the specifications that were found to work well for 1989. The regression model for producing state estimates of the proportion of school-age children in poverty has the following form (for details, see Fay, 1996): yit = (Sß tjxitj + zit) + eit , where: i = the state of interest, t = the year of estimation, j = the covariate, 4   This section draws heavily from the Census Bureau's documentation (Fay, 1996).

OCR for page 63
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations yit = direct estimate of the percentage of poor school-age children from the CPS in year t,5 zit = a random effect that represents differences between the model-based estimates and the direct estimates from the CPS, and eit = sampling variance for the dependent variable for state i in year t. The regression coefficients, ßtj, have the subscript t to indicate that they are reestimated for each year, and Sßtjxitj represents the portion of poverty that is linearly related to the covariates described below. The zit, are assumed to be independent and identically distributed in any given year. The eit are normal disturbances resulting from sampling variance. The quantity Sßtjxitj + zit represents the true poverty count for state i, which is the goal of the estimation procedure. The Census Bureau first performed a cross-sectional (linear) regression of the 1980 census estimates of poverty rates (1979 income) for school-age children on a variety of covariates. A cross-sectional regression was also fit for the 1990 census estimates of poverty rates (1989 income) for school-age children. (These regressions used ordinary least-squares estimation.) The residuals for the 1980 and 1990 census models were observed to be correlated (Fay, 1996), indicating that counties that had more poverty than predicted by the cross-sectional model for 1979 also tended to have more poverty than predicted by the cross-sectional model for 1989. This fact can be used to improve the 1989 predictions. Next, a regression model was built for the CPS estimates of school-age children's poverty rates in 1989. The covariates that were predictive in the regression models with census estimates of school-age children's poverty rates as the dependent variable were selected for inclusion in this model, along with the residuals from the regression of the 1980 census estimates of children's poverty rate on the same covariates. These covariates were (1) the percentage of child exemptions reported by families in poverty on tax returns, (2) the percentage of the noninstitutionalized population under age 65 that do not file income tax returns, (3) the percentage of the population that receives food stamps, and (4) the residuals from the regression fit on 1980 census poverty rates (discussed above). The CPS model of the poverty rates for school-age children was not used to select covariates because of the large sampling variability in the dependent variable. During the exploratory phase of model development, various transformations of both the dependent and independent variables were examined. The untransformed versions seemed to fit best, justifying the use of a model that is linear in percentages. 5   The percentage is calculated through the following ratio: the numerator is the number of poor related children aged 5–17 from the CPS, and the denominator is the estimated total population of noninstitutionalized children aged 5–17 (whether related or not) from the CPS.

OCR for page 63
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations In the basic regression model, the &x98;tj were estimated by weighted least squares, the weights being the inverse of the sum of the estimated sampling variance and the estimated random effects variance. Estimation of the sampling variances of the direct CPS estimates of school-age children's poverty rates was done in several steps. The computer program developed to produce variances for complex samples (VPLX)—with successive difference replication (related to balanced-half sample replication)—was used to provide the original variance estimates for the CPS estimated state poverty rates. To reduce the instability of these variance estimates, they were modeled using a generalized variance function, which is a function of the poverty rate (e.g., &x98;y + &x103;y2, where y is the poverty rate) divided by the state's sample size for each year. The years 1989–1993 were used to estimate the generalized variance function. The estimated variances for the random effects were calculated using maximum likelihood estimation. One complication of this approach is that the mean and the variance of the estimated poverty rates are linked, in the sense that the variance of an estimated proportion (p) is proportional to p(1-p). Therefore, an iteration was performed, in which the estimated variance for the sampling errors was updated to reflect new values for the model predictions. The iteration was repeated six times. Finally, the CPS direct estimates of school-age children's poverty rates were combined with fitted values from the regression, using an empirical Bayes approach similar to that applied in county estimation. These procedures produced CPS estimates of 1989 poverty rates; the same methods were used to produce CPS estimates of 1993 poverty rates. The estimated rates were then multiplied by either census counts (for the 1989 model) or population estimates (for the 1993 model) to arrive at estimates of the number of poor school-age children in each state. The state estimates were then benchmarked to sum to the CPS national estimate of the number of related school-age children in poverty. This adjustment was a minor one, involving multiplying the state estimates from the 1989 model by 1.0168 and those from the 1993 model by 1.0091.