APPENDIX E
Future Research

This appendix briefly discusses some of the analyses conducted by the panel of the Census Bureau's county-level model, notes questions raised by these analyses, and suggests possible future investigations. Five topics are covered: questions about the specification of the county-level model in comparison with the state-level model; the possible problems from the high degree of correlation (multicollinearity) among the predictor variables in the county-level model; the effects of implementing a constrained model; the possible advantages of using other models, such as a model of change in poverty over time or a model of poverty rates or ratios; and two issues raised by the use of a logarithmic transformation of the predictor and dependent variables.

MODEL SPECIFICATION

The model used by the Census Bureau to produce updated estimates of poor school-age children for states is of a form that appears in the literature, and the estimation procedures are those that have been used previously. Sampling variances are estimated directly, and the model error component of variance is estimated as a part of the model estimation. The model uses the poverty rate as the dependent variable and rates or ratios appear as predictor variables (covariates). The estimated rate is then applied to a population estimate (obtained from demographic analyses) to obtain the estimated number of poor school-age children.

The county-level model differs from the state-level model in two notable aspects: the county-level model is expressed in terms of logarithms of counts and the state-level model is in terms of rates; the county-level model uses data from



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 72
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations APPENDIX E Future Research This appendix briefly discusses some of the analyses conducted by the panel of the Census Bureau's county-level model, notes questions raised by these analyses, and suggests possible future investigations. Five topics are covered: questions about the specification of the county-level model in comparison with the state-level model; the possible problems from the high degree of correlation (multicollinearity) among the predictor variables in the county-level model; the effects of implementing a constrained model; the possible advantages of using other models, such as a model of change in poverty over time or a model of poverty rates or ratios; and two issues raised by the use of a logarithmic transformation of the predictor and dependent variables. MODEL SPECIFICATION The model used by the Census Bureau to produce updated estimates of poor school-age children for states is of a form that appears in the literature, and the estimation procedures are those that have been used previously. Sampling variances are estimated directly, and the model error component of variance is estimated as a part of the model estimation. The model uses the poverty rate as the dependent variable and rates or ratios appear as predictor variables (covariates). The estimated rate is then applied to a population estimate (obtained from demographic analyses) to obtain the estimated number of poor school-age children. The county-level model differs from the state-level model in two notable aspects: the county-level model is expressed in terms of logarithms of counts and the state-level model is in terms of rates; the county-level model uses data from

OCR for page 72
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations the March 1993, 1994, and 1995 CPS in constructing the dependent variable, and the state level model uses data only from the March 1994 CPS. The logarithm of the number of school-age children in poverty is the dependent variable of the county-level model, and the covariates are all logarithms of estimated population counts. The use of counts rather than rates in the county-level model is justified on the basis that the chosen model permits estimation of standard errors of model predictions. The Census Bureau has argued that a model for rates would not permit estimation of standard errors since there are no standard errors for the demographic population estimates. This argument applies to the estimated numbers of school-age children in poverty, which are the main consideration in Title I allocations. However, the allocations also depend on county poverty rates for school-age children, which the Department of Education produces by dividing the estimated counts by the demographic population estimates. Hence, the problems resulting from any lack of understanding of the variability in the population estimates are unavoidable. Furthermore, the Census Bureau has performed considerable evaluation of its population estimates that should make it possible to develop adequate estimates of standard errors for these purposes. The county-level model uses a component of variance for counties that is estimated from census data (see Appendix C); this use of census data is potentially valuable, but it also raises questions. It requires the assumption that the variance of model error for the 1990 census data for poverty in 1989 is the same as the variance of model error for CPS data for poverty in 1993: this assumption can be investigated. For example, it is possible to construct direct estimates of the sampling variance of the CPS poverty estimates for a number of the larger counties in the sample. These directly computed CPS sampling variance estimates can then be compared to those obtained from the county-level model. In addition, a model for the county-level CPS sampling variances could be constructed. Given CPS sampling variances estimated from such a model, procedures analogous to those used for the state-level model could be used to estimate the variance of model error for the county-level model. The estimated variance of model error can then be compared to that estimated from the census data. The panel believes such comparisons should be conducted. It would also be possible to develop a model to combine the estimated variance of model error derived from the census with the estimate obtained from a standard small-area analysis. The Census Bureau controls the county estimates to sum to the state estimates by means of a ratio adjustment procedure. As Table E-1 shows, this procedure produced some rather large adjustments of the county estimates. The reasons for these sizable adjustments need investigation. An alternative approach for aligning the state and county estimates would be to include a state component of variance in the county-level model. Such a model can be written as

OCR for page 72
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations where zs is the state component. Although the forms of the state-and county-level models may contribute to the sizable variability in the ratio adjustments, that variability is consistent with the existence of a state (or similar) component of variance. Smoothed estimates that are constructed with a two-component county-level model would differ from those constructed under the present procedure. To the panel's knowledge, the magnitude of the differences and the degree to which the magnitude of the state adjustments are consistent with the size of a directly estimated state component of variance have not been investigated. The state-and county-level models are inconsistent in that if one derives a model for states from the county-level model by aggregating counties to states, it differs from the state-level model that was used. It is not a requirement that the state-and county-level models be consistent, but inconsistent models should be used only if there is a good reason to do so. In this case, the decision to use distinct models may have been primarily driven by the administrative organiza- TABLE E-1 Ratios of State Estimates of the Number of School-Age Children in Poverty in 1993 to the Sum of Uncontrolled County Estimates for 1993, Selected States State Ratio of State Estimate to Sum of County Estimates Alaska 1.33 Connecticut 1.24 Michigan 1.22 Massachusetts 1.21 West Virginia 1.16 New Jersey 1.12 Arizona 1.11 New York 1.11 Florida 1.05 California 1.01 Wyoming 1.01 Texas 0.98 Mississippi 0.98 Alabama 0.97 Illinois 0.97 Nebraska 0.94 Idaho 0.89   SOURCE: Calculated by the panel from data that were made available to the panel in January 1997. Subsequently, the Census Bureau discovered errors in the input data for a few counties that changed somewhat the 1993 estimates from the county-level model; however, the general patterns reported above hold true. (The state estimates were unchanged.)

OCR for page 72
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations tion of the estimation process, time pressures that made it necessary to divide up the estimation process into distinct stages, and the use of different data sources. Future research should consider a more integrated approach. MULTICOLLINEARITY The variables used in the county-level model have two important statistical properties. First, they are very strongly correlated; Table E-2 presents the correlation matrix of the explanatory variables. Second, all the variables are measured with error. These two properties mean that individual regression coefficients have large variances and that there is a potential for bias in the model's predictions (see below). Predictions for individual counties will not be seriously affected by variation in individual coefficients if the counties used for estimation are representative of all counties. However, the counties used for parameter estimation are not a random sample of all counties, since counties without CPS sample households with poor school-age children are not used in estimating the regression model. Therefore, the conditions under which predictions based on error-prone observations are unbiased are not satisfied. TABLE E-2 Correlation Matrix of Independent Variables Used in 1993 County-Level Model Variable x1 x2 x3 x4 x5 x1 (tax returns, poor, < 21) 1.000 0.959 0.948 0.950 0.971 x2 (food stamp recipients)   1.000 0.917 0.908 0.973 x3 (population <21)     1.000 0.996 0.914 x4 (tax returns, total, <21)       1.000 0.907 x5 (1990 census, poor, 5–17)         1.000   SOURCE: Data from Bureau of the Census. CONSTRAINING THE REGRESSION COEFFICIENTS TO SUM TO ONE The estimated county-level model for poverty among school-age children in 1993 is (the standard errors of the regression coefficients are shown in parentheses) where, at the county level, the dependent variable is

OCR for page 72
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations yi = log(3-year weighted average of the CPS number of poor school-age children in county i.), and the independent (predictor) variables are x1i = log(number of child exemptions [assumed to be under age 21] reported by families in poverty on tax returns in county i), x2i = log(number of people receiving food stamps in county i), x3i = log(estimated noninstitutionalized population under age 21 in county i), x4i = log(number of child exemptions on tax returns in county i), and x5i = log(number of poor school-age children in county i in the previous [1990] census). The sum of the regression coefficients for the predictor variables in this model is 1.045. The fact that this sum exceeds 1 implies that the model estimates a higher poverty rate for large counties than for small counties, as is shown in Table E-3 for three hypothetical counties of different sizes, where each of the predictor variables increases across the counties directly in the proportions 1:5:25. As is shown in the final row of the table, the Census Bureau's county-level model estimates a poverty rate that is 15 percent higher for the large county than for the small county. At the request of the panel, the Census Bureau estimated a restricted model that imposed the condition that the sum of the coefficients of the predictor variables be 1.0. The model was estimated with CPS data for the period corresponding to the 1990 census as the dependent variable. The Census Bureau's unconstrained county-level model was estimated from the same data. The estimated coefficients for the two models (constrained and unconstrained) are given in Table E-4. They are similar for three variables: number of child exemptions reported TABLE E-3 1993 County-Level Model Estimates for Three Hypothetical Counties Variable County 1 County 2 County 3 x1 (tax returns, poor, <21) 480 2,400 12,000 x2 (food stamp recipients) 760 3,800 19,000 x3 (population <21) 3,880 19,400 97,000 x4 (tax returns, total, <21) 3,200 16,000 80,000 x5 (1990 census poor 5–17) 320 1,600 8,000 Y (CPS 3-year average, poor school-age children) 285 1,534 8,246 Y / x3 0.0735 0.0791 0.0850

OCR for page 72
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations TABLE E-4 Estimated Coefficients, Unrestricted and Restricted, for County-Level Model of Poverty Among School-Age Children in 1989 Variable Unrestricted Coefficient Restricted Coefficient Intercept -0.841 -0.439 x1 (tax returns, poor, <21) 0.505 0.436 x2 (food stamp recipients) 0.307 0.354 x3 (population < 21) 0.713 0.393 x4 (tax returns, total, <21) -0.754 -0.436 x5 (1980 census poor 5–17) 0.270 0.253   SOURCE: Data from Bureau of the Census. by families in poverty on tax returns, the 1980 census estimate of poor school-age children in 1979, and food stamp recipients. They differ considerably for the two size variables—number of child exemptions on tax returns and population under age 21. However, the sum of the coefficients for the unrestricted model is 1.041, close to the 1 for the restricted model. The fitted models were used to estimate the number of school-age children in poverty in 1989 for every county; the averages of the relative differences (model estimate minus census estimate, divided by census estimate) for counties grouped by population size are reported in Table E-5. Not surprisingly, all the differences are positive since poverty rates based on CPS data are 6 percent higher than rates based on census data. It is clear that imposing a restriction on the coefficients makes a difference in the behavior of the estimates with respect to size. The sum of the coefficients for the predictor variables is significantly different from 1.0 in the unrestricted county model. There are several possible reasons that large counties may tend to have higher poverty rates than small counties. For example, there is an urban factor: that is, counties with very large populations always include urban areas, and very small counties are always rural, although the relationship of urban areas and county size is not consistent. (As examples, some small cities constitute only a part of a populous surrounding county while the nation's largest city, New York, is divided among five counties.) County size may also be a proxy for other variables that are less obvious. However, county size may enter into the model for artifactual reasons because the county model is nonlinear and may have some biases when the population is very small (such as the removal of counties with no poor school-age children from the model). Since it is difficult to interpret and evaluate the role of size in the county model, it would be desirable, if possible, to identify the relevant characteristics for which size is a proxy and enter them into the model directly.

OCR for page 72
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations TABLE E-5 Percentage Differences Between 1990 Census and Model Estimates of Poor School-Age Children in 1989, for Unrestricted and Restricted County-Level Models, for Counties Grouped by Population Size County Size Category Unrestricted Model Restricted Model 0 to 9,999 2.1 12.7 10,000 to 19,999 2.8 10.6 20,000 to 49,999 3.9 9.2 50,000 to 99,999 4.4 7.6 100,000 to 499,999 7.4 9.3 500,000 and over 7.4 5.9   SOURCE: Calculated by the panel from data provided by Bureau of the Census. OTHER KINDS OF MODELS It is possible that administrative records are not consistent for all counties. Thus, the relationship between administrative indicators and characteristics of interest may not be constant across counties. One way to reduce the effect of inconsistency in the administrative data is to model changes. A closely related procedure, which is used in the state-level model, is to include residuals from a model fit during an earlier ''control" period as an explanatory variable. A change model assumes that administrative procedures have been relatively constant within any given county over the study period. The decision to use only current administrative data and not data on changes is based on the judgment that administrative procedures are more similar over areas than over time. The fact that residuals were useful in the state-level model, however, suggests that such variables should be considered for the county-level model. An alternative to the use of a numbers model for counties is the use of a rate model. At the request of the panel, the Census Bureau estimated a county-level model with the dependent variable equal to the ratio of the CPS estimated number of poor school-age children (related children aged 5–17) to the CPS estimated population of noninstitutionalized children aged 5–17 (i.e., the same kind of dependent variable used in the state-level model). The results showed that the residual mean square error for this model was considerably less than that for the Census Bureau's county-level model. The panel believes that models for rates should be fully investigated. USE OF THE LOGARITHMIC TRANSFORMATION The county-level model is estimated in logarithms, and, in transforming back to the original scale of poverty counts, a correction is made to the exponentiated

OCR for page 72
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations values. A different correction for bias due to this nonlinearity that could be explored is to regress the original observations on the exponentiated log predicted values. (This regression format can also be used for other model checks.) The use of a logarithmic transformation leads to a problem in the treatment of counties that contain no school-age children in poverty. All counties with no school-age children in poverty in the CPS 3-year sample were dropped from the estimation of the model's coefficients since the logarithm of 0 cannot be computed. Generally, these were counties with extremely small numbers of sampled households. Because of the large variance of poverty estimates for such counties, it is conjectured that the omission of such data has little impact on the estimates. However, it would be desirable to have analyses supporting this conjecture. Also, it would be desirable to investigate the use of a generalized linear model as an alternative modeling approach that does not require removing counties with no school-age children in poverty from the estimation of the regression coefficients.