2
Census Bureau Estimation Procedure

Reliance on the most recent decennial census to allocate federal funds to counties and other small areas has primarily reflected the absence of alternative data sources with comparable or superior reliability. Mindful of the need for small-area estimates that are more up to date than census estimates, the Census Bureau organized a program—Small Area Income and Poverty Estimates (SAIPE)—to study methods for producing postcensal income and poverty estimates for states and counties by using multiple data sources and innovative statistical methods. The Census Bureau launched this program in late 1993 with financial support from a consortium of five federal agencies. Congress made this work more urgent by charging the Census Bureau in late 1994 to produce updated estimates of poor school-age children for counties and school districts every 2 years to begin in 1996 with 1993 estimates for counties and in 1998 with 1995 estimates for school districts.

The program faces a challenging task. For Title I allocations, there is no single administrative or survey data source that provides all of the information required to develop reliable estimates of the number and proportion of school-age children in families in poverty by county or school district. The March Income Supplement to the CPS can provide reasonably reliable annual estimates of such population characteristics as the number and proportion of poor children at the national level and for some states. However, the CPS cannot provide estimates for the majority of counties because the sample does not include any households in them. And for almost all of the counties with households in the CPS sample (about 1,500 of a total of 3,143 counties in 1993), the estimates have a high



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations 2 Census Bureau Estimation Procedure Reliance on the most recent decennial census to allocate federal funds to counties and other small areas has primarily reflected the absence of alternative data sources with comparable or superior reliability. Mindful of the need for small-area estimates that are more up to date than census estimates, the Census Bureau organized a program—Small Area Income and Poverty Estimates (SAIPE)—to study methods for producing postcensal income and poverty estimates for states and counties by using multiple data sources and innovative statistical methods. The Census Bureau launched this program in late 1993 with financial support from a consortium of five federal agencies. Congress made this work more urgent by charging the Census Bureau in late 1994 to produce updated estimates of poor school-age children for counties and school districts every 2 years to begin in 1996 with 1993 estimates for counties and in 1998 with 1995 estimates for school districts. The program faces a challenging task. For Title I allocations, there is no single administrative or survey data source that provides all of the information required to develop reliable estimates of the number and proportion of school-age children in families in poverty by county or school district. The March Income Supplement to the CPS can provide reasonably reliable annual estimates of such population characteristics as the number and proportion of poor children at the national level and for some states. However, the CPS cannot provide estimates for the majority of counties because the sample does not include any households in them. And for almost all of the counties with households in the CPS sample (about 1,500 of a total of 3,143 counties in 1993), the estimates have a high

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations degree of sampling variability.1 Nonetheless, the CPS data can serve as the basis for creating usable estimates for counties through the application of statistical estimation techniques to develop "model-based" or "indirect" estimates. Indirect or model-based estimators use data from several areas, time periods, or data sources (which could include the previous census) to "borrow strength" and improve precision. A model-based approach is useful when there is no single data source for the area and time period in question that can provide direct estimates that are sufficiently reliable for the intended purpose. Previously, the Census Bureau used this strategy to develop estimates of median family income for states (Fay et al., 1993) and, in part, to develop population estimates for states and counties (see Spencer and Lee, 1980). This chapter describes the model-based approach as used by the Census Bureau to develop revised estimates by county of the number and proportion of school-age children in families in 1994 who were poor in 1993 (referred to as the revised 1993 estimates). The Census Bureau's estimation procedure for counties uses two regression models that predict poor school-age children—a county model (revised from the original model) and a separate state model—along with county population estimates. The steps in the procedure for the revised 1993 estimates include: Developing and applying the Census Bureau's revised county model to produce initial estimates of the number of poor school-age children. The county estimation process involves: obtaining data from administrative records and other sources that are available for all counties to use as predictor variables; specifying and estimating a regression equation that relates the predictor variables to a dependent variable, which is the estimated log number of poor school-age children from 3 years of the March CPS for counties with households in the CPS sample; and using the estimated regression coefficients from the equation and the predictor variables to develop estimates of poor school-age children for all counties. For counties with households in the CPS sample, the predictions from the model are then combined by a "shrinkage" procedure with the CPS estimates for those counties. Developing and applying the Census Bureau's state model to produce estimates of the number of poor school-age children by state. The state estima- 1   For a description of the March CPS and differences between income and poverty data from the CPS and the 1990 census long-form sample, see National Research Council (1997:Ch. 2; App. B). The 1990 census sample includes households in all counties and covers 15 million households, 30 times more than the 50,000 households in the CPS; even the 1990 census estimates are highly variable for some small counties (National Research Council, 1997:Table 2-1).

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations tion process is similar to that for counties, although the state model differs from the county model in several respects. Adjusting the initial estimates of poor school-age children from the county model (step 1) for consistency by state with the estimates from the state model (step 2) to produce final estimates of the numbers of related children aged 5-17 in poverty by county for 1993. Producing county estimates for 1994 of the total number of children aged 5-17 from the Census Bureau's population estimates program. The Department of Education uses the estimates from step 3 and step 4 to calculate estimated proportions of poor school-age children for counties, which are also needed for the Title I allocation formulas. Estimates for Puerto Rico, which is treated as a county equivalent in the allocation formula, are developed separately (see Chapter 5; see also National Research Council, 1997:App. F). Steps 1-4 are summarized in the remainder of this chapter (see also Appendices A and B; Coder et al., 1996; Fisher and Siegel, 1997). The last section describes the differences between the revised 1993 estimates that were provided to the panel in October 1997, which are assessed in this report, and the original 1993 estimates that were provided to the panel in January 1997 and assessed in its first interim report (National Research Council, 1997). The changes in the estimates result principally from a change in one of the predictor variables in the county model that was found to improve its performance.2 REVISED COUNTY MODEL County Equation The county equation uses as predictor variables county estimates from Internal Revenue Service (IRS) records for 1993, food stamp program records for 1993, the 1990 census, and the Census Bureau's population estimates program for 1994. As the dependent or outcome variable, it uses county estimates of the number of poor school-age children averaged over 3 years of the March CPS (data from the March 1993, 1994, and 1995 CPS, covering income in 1992, 1993, and 1994). The equation takes the following form: 2   Subsequent chapters refer to the revised county model as the "log number (under 18) model" to distinguish it not only from the original model, but also from alternative models that were evaluated (see Chapters 3 and 4).

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations where: yi = log(3-year weighted average of poor school-age children in countyi),3 x1i = log(number of child exemptions reported by families in poverty on tax returns in county i), x2i = log(number of people receiving food stamps in county i), x3i = log(estimated population under age 18 in county i), x4i = log(number of child exemptions on tax returns in county i), x5i = log(number of poor school-age children in county i in the previous census), ui = model error for county i, and ei = sampling error of the dependent variable for county i. Dependent Variable The Census Bureau decided to model the number of poor school-age children, instead of the proportion, because of concern that the county population estimates of school-age children that would form the basis for converting the estimated proportions to estimated numbers were of uncertain quality. Hence, it would be difficult to construct estimates of the precision of the estimated numbers of poor school-age children, which play the most important role in the Title I allocation formula. The Census Bureau decided to estimate the number of poor school-age children at a particular time and not to estimate the change in the number since the 1990 census because it concluded that the available administrative data were likely to be measured more consistently across areas at a given time than they would be over time, given changes in tax and transfer programs. The Census Bureau decided to combine 3 years of CPS data for county estimation to improve the precision of the CPS estimates. Because only a subset of counties have households in the March CPS sample, the relationships between the predictor variables and the dependent variable in the model are estimated solely from this subset of counties. This subset includes proportionately more large counties and proportionately fewer small counties than the distribution of all counties. Because values of 0 cannot be transformed into logarithms, a number of counties whose sampled households contain no poor school-age children are excluded from the estimation. In all, 1,184 of 3,143 counties were included in the 1993 model estimation—the remainder either had no CPS sampled households with poor school-age children (304 counties) or no CPS sampled households at all (1,655 counties). 3   The weighted average of the number of poor school-age children in each county is the product of the weighted 3-year average CPS poverty rate for related children aged 5-17 and the weighted 3-year average CPS number of related children aged 5-17; see National Research Council (1997:Ch. 3) for how the weights are derived.

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations Predictor Variables The choice of predictor variables was governed by data availability and the assumed relationship of the variables to poverty. The number of child exemptions reported by families in poverty on tax returns and the number of food stamp recipients were included as variables that are indicative of poverty and available on a consistent basis (or reasonably consistent basis, in the case of food stamps) for all counties in the nation.4 The 1990 census estimate of poor school-age children was used in the 1993 model on the assumption that previous poverty is likely to be indicative of subsequent poverty. The total number of child exemptions on tax returns and the population estimate of the total number of children under 18 were included in order to cover children not reported on tax returns (i.e., in nonfiling families), who are assumed to be poorer on average than other children. (The estimated regression coefficients for the county model predictor variables are given in Table 4-1.) Form of the Variables The dependent variable and all of the predictor variables are measured on a logarithmic scale. A reason to use logarithms is the wide variation in the CPS estimate and the values of the predictor variables among counties: transforming the variables to logarithms made their distributions more symmetric and the relationships between some of them and the dependent variable more linear. Estimation of Model and Sampling Error Variance The total squared error of the county estimates (the difference between the model estimates and the direct estimates from the CPS) has two sources: model error (u) and sampling error (e), which are the last two terms in the county equation.5 Model error is the difference between the value of the dependent variable that would have been obtained had all the households in the county been included in the CPS sample and the model estimate based on the predictor variables. Sampling error is the difference between the estimate of the dependent variable from the CPS sample and the value of the dependent variable that would have been obtained had all households in the county been included in the CPS sample. Model error is assumed to be constant across counties (see below). Sampling error is not constant across counties: it is larger for counties that have fewer households included in the CPS sample. Because a procedure to estimate the sampling error variance directly for the March CPS has not yet been developed (see Chapter 6), the variances of the 4   Poverty status for families on tax returns is determined by comparing the adjusted gross income on each return to the average poverty threshold for the total number of exemptions on the form. Although there axe differences between the CPS and IRS definitions of income and family composition, they are not critical for purposes of developing a predictive model. 5   As used in statistics, "error" is the inevitable discrepancy between the truth and an estimate due to variability in measurements and the fact that model relationships are not precise.

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations model error and sampling error terms in the county equation are estimated in a multiple-step process that involves several assumptions. First, equation (1) is estimated for 1989, using the 1990 census estimate of poor school-age children as the dependent variable and 1989 IRS and food stamp data, 1990 census population data, and 1980 census poverty data as the predictor variables. A generalized variance function is used to estimate the sampling variance of the census estimates, which is quite small because of the large size of the census long-form sample. The total model error variance is then obtained by subtracting the sum of the estimated sampling variances from the estimated total squared error in the census equation. It is assumed that the total model error variance for the CPS equation for 1993 is the same as that for the 1990 census equation and that it has the same value for each county. The total sampling variance for the CPS equation, which is obtained by subtracting the total model error variance from the estimated total squared error, is then distributed among the counties as an inverse function of their sample size. The resulting estimates of model error variance and sampling error variance are used to form weights for use in estimating the county model equation by weighted least squares.6 They are also used to determine the weight to give to the model prediction and to the CPS direct estimate in developing estimates of poor school-age children for counties with sampled households in the CPS. Combining the County Equation and CPS Estimates By calculating the relationships among the predictor variables and the CPS estimates of school-age children in poverty for the subset of counties that have households with poor school-age children in the March CPS sample, it is possible to obtain a good estimate of an equation for predicting the number of poor school-age children in a county, even though the CPS estimate for any specific county has a large level of uncertainty for many small counties. The prediction equation can then be used to predict the number of school-age children in poverty from the food stamp, IRS, population estimates, and previous census predictor variables for each county, whether or not the county is in the March CPS sample. For counties that have households with poor school-age children in the March CPS sample, a weighted average of the model prediction and the estimate based on data from the sampled households (the direct estimate) is used to produce an estimate for that county using empirical Bayes ("shrinkage") procedures for combining estimates (see Fay and Herriot, 1979; Ghosh and Rao, 1994; and Platek et 6   The weights used are the reciprocal of the sum of the estimated sampling variance of the estimate of the log number of poor school-age children in a given county plus the estimated model error variance, assumed to be constant across counties; see Appendix A (see also National Research Council, 1997:App. C)

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations al., 1987). The weights that are given to the model prediction and the direct estimate depend on their relative precision (see discussion above of how model error variance and sampling error variance are estimated). For a county with very few sample households in the CPS and hence a high level of sampling variability in the direct estimate, most of the weight will be given to the model prediction and little to the direct estimate. For a county with a larger number of sampled households in the CPS, more weight will be given to the direct estimate and less to the model prediction. In either case, assuming that the weights have been well estimated, the combined estimate will be at least as accurate as the better of the separate estimates (from the model or the CPS).7 For counties that lack households with poor school-age children in the CPS sample, the prediction from the model is the estimate. STATE MODEL State Equation The state model equation takes the following form (see also Fay, 1996; Fay and Train, 1997): where: yi = proportion of poor school-age children in state i from one year of the CPS,8 x1i = proportion of child exemptions reported by families in poverty on tax returns in state i, x2i = proportion of people receiving food stamps in state i, x3i = proportion of people under age 65 who did not file an income tax return in state i,9 7   For almost all counties that have households with poor school-age children in the CPS, most of the weight is given to the model prediction; for only 2 counties is the weight for the model prediction less than 0.5 and for only 13 counties is the weight for the model prediction less than 0.75. 8   The numerator is the estimated number of poor related children aged 5-17 from the CPS, and the denominator is the estimated total population of children aged 5-17 (whether related or not) from the CPS. (The CPS universe excludes people in institutions and in military group quarters.) 9   This percentage is obtained by subtracting the estimated number of exemptions on income tax returns for people under age 65 from the estimated total population under age 65 derived from demographic analysis; see Appendix B.

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations x4i = residual for state i from a regression of the proportion of poor school-age children from the prior decennial census on the other three predictor variables, ui = model error for state i, and ei = sampling error of the dependent variable for state i. Differences from the County Equation The Census Bureau's state model for estimates of poverty among school-age children is similar to the county model. However, it differs in a number of respects: Dependent Variable The state model uses the proportion of school-age children in poverty in each state as the dependent variable: that is, the dependent variable is a poverty ratio rather than the number of poor school-age children, as in the county model.10 The numerator for the ratio is the CPS estimate of poor school-age children in a state (i.e., the estimate of the number of poor related children aged 5-17); the denominator is the CPS estimate of the total number of children aged 5-17 in the state. A different denominator—total CPS school-age children, rather than the slightly smaller universe of related school-age children—is used for consistency with the population estimates that are available to convert the estimated poverty ratios to estimated numbers of poor school-age children. In addition, the dependent variable in the state model is derived from 1 year of CPS data (the March 1994 CPS for the 1993 model), rather than a 3-year average as in the county model. This decision was made because the sample sizes for states permit estimating the model with reasonable accuracy. It implicitly assumes that it is preferable when possible to have estimates that pertain directly to the income year. Predictor Variables The state model uses a somewhat different set of predictor variables than the county model. (The estimated regression coefficients for the state model predictor variables are given in Table 4-5.) The state model includes a predictor variable that is the residual from a regression of the proportion of poor school-age children from the prior decennial census on the other three predictor variables. During the development of the state model, the Census Bureau determined that there was a correlation between the residuals from estimating the model for 1979 with 1980 census data and the residuals from estimating the model for 1989 with 1990 census data. In other words, states that had 10   The predicted variable is termed a ratio because the denominator is not exactly the same as that for the official published poverty rates.

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations more poverty than predicted by the cross-sectional model for 1979 also tended to have more poverty than predicted by the cross-sectional model for 1989. This result was used to improve the model predictions by including the residual from a regression for the prior census as one of the predictor variables. Form of the Variables The variables in the state model are proportions rather than numbers and are not transformed to a logarithmic scale as is done in the county model.11 A log-based model was examined, but the Census Bureau decided not to transform the variables because, unlike the situation with the county model, the state-level distributions of the estimated proportions for the predictor variables are reasonably symmetric, and the relationships of the state-level estimated proportions with the dependent variable are approximately linear. Combining the State Equation and CPS Estimates All states have sampled households in the CPS; however, the variability associated with estimates from the CPS is large for some states. As is done for the initial county estimates, the predictions from the state model and the CPS estimates are weighted according to their relative precision to produce estimates of the proportion of poor school-age children in each state. To produce estimates of the number of poor school-age children in each state, the estimates of the proportion poor are multiplied by estimates of the total number of noninstitutionalized school-age children. For the 1993 model, these estimates are derived from the Census Bureau's program of population estimates.12 Finally, the state estimates of the number of poor school-age children are adjusted to sum to the CPS national estimate of related school-age children in poverty: this adjustment is a minor one, involving multiplying the state estimates for 1993 by 1.0091. RAKING THE COUNTY ESTIMATES TO STATE ESTIMATES The final step in developing estimates of numbers of poor school-age children by county is to adjust the estimates from the county model for consistency with the estimates from the state model. The estimated logarithmic counts from the county model are first transformed to numbers (with a correction for transfor- 11   The estimates that are transformed into logarithms in the county model are numbers, not proportions. However, evaluation determined that, if the county model were to estimate proportions, a logarithmic transformation of the dependent and predictor variables would be helpful in that case as well. 12   The estimates of noninstitutionalized school-age children, which include some adjustments for residents of military group quarters and college dormitories, are the closest approximation available to the CPS estimates of school-age children.

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations mation bias).13 The estimate for each state from the state model is then divided by the sum of the estimates for each county in that state to form a state raking factor. Each of the county estimates in a state is multiplied by the state raking factor so that the sum of the adjusted county estimates equals the state estimate. For the revised county estimates of poor school-age children in 1993, the average state raking factor was 1.065; two-thirds of the factors were between 0.975 and 1.154. ESTIMATING PROPORTIONS The Census Bureau's county model predicts the number of school-age children in families in poverty. Estimates of the proportion of poor school-age children in families, which play an important but secondary role in the Title I allocation formula, are obtained by the Department of Education by dividing the estimated number of poor school-age children from the county model by an updated estimate of the total county population aged 5-17. These estimates are produced from the Census Bureau's population estimates program (see Appendix B). DIFFERENCES BETWEEN TWO PROCEDURES The procedure described above to produce the revised 1993 county estimates that were provided to the panel in October 1997 differs in some respects from the procedure that was used to produce the original 1993 estimates. Specifically: The revised county model includes the population under 18 as a predictor variable; the original county model included the population under 21 as a predictor variable. The purpose of this variable (whether for the population under 18 or under 21) is to estimate—in conjunction with the variable measuring total child exemptions on IRS tax returns—the number of children in families that did not file a tax return. Evaluation determined that the estimation was not working well for counties with large numbers of people under age 21 in group quarters, primarily college students and military personnel. Specifically, the model was overpredicting the number of school-age children for those counties. Limiting the predictor variable to the population under 18 reduced the bias in the model 13   Transformation bias occurs when a regression model estimates an expected value for the dependent variable that is on a different scale than that for which estimates are needed. In this instance, the county model predicts poor school-age children on the log scale; when the predictions on the log scale are exponentiated back to the original numeric scale, the result is the exponential of the expected value of the dependent variable on the log scale, which is different from the expected value of the dependent variable on the original scale. This difference is referred to as transformation bias, for which a correction is made.

OCR for page 9
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations predictions for counties classified by percent group quarters residents and improved the model predictions in other respects (see Chapter 4). Examination of the pattern of residuals (differences between the model predictions and the direct estimates) for counties with sampled households in the March CPS indicated that the original method for estimating model error variance and sampling error variance (described above) was not working as well as it should. The variability of the standardized residuals increased with the number of CPS sample cases rather than remaining constant, and this pattern was common to a variety of alternative models that were examined. The revised 1993 model includes a slight revision to the procedure for estimating the sampling error variance, which moderated but did not eliminate the anomalous pattern. Further work (see Chapter 6) will be required to further reduce the problem. However, this variability probably has limited effect on the estimates because the main effect of the sampling error variance estimation is on the weight to give to the model prediction versus the CPS direct estimate in forming estimates for counties that have sampled households with poor school-age children in the CPS. Since the direct estimates have small weights for most counties, changing the weights will not have a substantial impact. The original model was estimated using a method-of-moments procedure; for the revised model, it was decided to use maximum likelihood estimation. There is a small effect on the estimated regression coefficients for the predictor variables from the use of maximum likelihood instead of method of moments. Primarily, the effect is to increase the estimated sampling error variance. Hence, in comparison with the original 1993 estimates, the revised model predictions are given somewhat more weight and the CPS direct estimates are given somewhat less weight when weighted estimates are formed for counties that have sampled households with poor school-age children in the CPS. However, as just noted, relatively few counties have large weights on the direct estimates. The 1994 population estimates of children aged 5-17 that are used to convert the revised estimated numbers of poor school-age children to estimated proportions differ somewhat from the original 1994 population estimates that were used. These revised estimates incorporate more complete records of births and deaths. They also include a refined raking adjustment: the estimates are derived by an iterative proportional fitting procedure that rakes the 1990 census county estimates for school-age children to independently derived county total population estimates and state estimates of school-age children for 1994. The refinement was to rake separately the 1990 census estimates of school-age children in group quarters and school-age children not in group quarters.