Research and Development Priorities
There are several reasons that make it important for the Census Bureau to continue an active program of research and development for methods of estimating poverty for school-age children at the county and school district levels. For counties, although there is clear evidence that the county model is performing reasonably well, the county (and state) model evaluations have identified a number of issues that warrant investigation as a priority in the short term to determine how to further improve the estimation procedures. Also, with a model-based approach, it is important to examine carefully the continued applicability of a model each time it is used and to modify it appropriately when necessary. In addition, research is needed to take account of likely future developments in the availability and characteristics of data sources that have implications for the modeling effort and to work on longer term modeling issues. Continued work to improve the county model is important not only for county estimates, but also to improve school district estimates that are developed by using the within-county shares estimation procedure.
For school districts, the important short-term priority is to investigate ways to improve the within-county shares method for developing updated estimates of total and poor school-age children. Also, it is not too soon to begin research on ways to take advantage of likely future developments in available data that could make it possible to develop an estimation method that (unlike the shares method) captures changes in school-age poverty among districts within counties as well as changes between counties.
This chapter identifies short-term priorities for research and development of the current Census Bureau models for estimates of poor school-age children. The
panel's final report (National Research Council, 2000) discusses these short-term priorities as well. It also develops an agenda for longer-term work to take advantage of possible new sources of survey and administrative records data that could improve the Bureau's models for poor school-age children and the other income and poverty estimates that are produced by the Bureau's SAIPE Program.1
The chapter begins by reviewing the schedule for the Census Bureau to provide updated small-area estimates of poor school-age children. It then considers short-term research priorities for county and school district estimates and mentions some longer term priorities as well. It concludes by noting the requirements for an ongoing program of small-area income and poverty estimates, particularly for thorough evaluation and full documentation of models and results (see also National Research Council, 2000:Ch.7).
The next three legislatively mandated deadlines for the Census Bureau to deliver updated school district estimates of poor school-age children to the Department of Education for use in Title I allocations are as follows:
October 2000: estimates for 1997 (or later) for use for allocations for the 2001-2002 and 2002-2003 school years
October 2002: estimates for 1999 (or later) for use for allocations for the 2003-2004 and 2004-2005 school years
October 2004: estimates for 2001 (or later) for use for allocations for the 2005-2006 and 2006-2007 school years
In each case, three estimates are needed for each school district: numbers of poor and of total school-age children and the total population. Although the legislation does not require county estimates, they will be needed as long as the method for producing school district estimates includes an adjustment or control to county estimates. There is also interest in state and county estimates of poor children for other important public policy uses, such as evaluating the effects of changes in welfare programs.
Priorities for short-term and longer term research should consider the impor-
The final report considers the possible role for the SAIPE Program of two new survey data sources with relatively large sample sizes, the 2000 census long-form survey and the planned American Community Survey, as well as two smaller ongoing surveys, the March CPS and the Survey of Income and Program Participation. The report also considers the role of improvements to the Census Bureau's Master Address File and associated geographic coding system in making it possible to use administrative records to develop poverty estimates for school districts and other subcounty areas.
tant changes that are likely to occur in the availability of data for modeling over the next 5 years and beyond, which include:
current and future changes to welfare programs and tax systems that may affect the comparability or applicability of Food Stamp Program and Internal Revenue Service (IRS) data for use in small-area estimation models;
the income and poverty estimates for small areas that will be available from the 2000 decennial census long-form sample of about 18 million households beginning in 2002; and
the planned introduction of the American Community Survey (ACS) as a large-scale, continuing sample survey of U.S. households, conducted primarily by mail, that will provide estimates similar to those provided by the decennial census long-form sample, including income and poverty estimates for small areas. The ACS is currently under development. Beginning in 2003, the full ACS sample will be 250,000 housing units each month throughout the decade, for an annual sample size of about 3 million housing units spread across all counties in the nation. The current plan is that the ACS, like the 2000 census long form, will oversample small jurisdictions. Unlike the 1990 census, the oversampling in the 2000 census and the ACS includes small school districts.2
In its third interim report (National Research Council, 1999), the panel identified seven types of research that should be pursued as a priority to determine if the current estimation procedure for counties can be improved: modeling of CPS county sampling variances; estimation of model error and sampling error variance in the state model; methods to incorporate state effects in the county model; discrete variable models that include counties in the CPS sample that have no sampled households with poor school-age children; ways to reduce the time lag of the estimates; evaluation of food stamp and other input data; and large category differences and residual patterns for the state and county models. Since then, the Census Bureau has made progress in several of these research areas as noted below.
Modeling of CPS County Sampling Variances The residual variance for the county model comprises two components: the model error variance and the sampling error variance of the dependent variable. These two components need
For information about the ACS, see Alexander (1998); the Census Bureau 's web site: http://www.census.gov/acs/www; and National Research Council (2000:Ch.4, 5).
to be reasonably well estimated for the application of the model (e.g., to determine the relative weights of the regression estimate and the direct estimate in the shrinkage procedure). The current approach for estimating these components is to assume that the model error variance from the 1989 regression equation with the dependent variable formed from 1990 census data is the same as the model error variance when the dependent variable is formed from the 3 years of CPS data that are used for the county model equation for the estimation year. The total sampling error variance is then obtained simultaneously with the regression parameter estimates through use of maximum likelihood estimation. As part of this procedure, the sampling variance for a particular county is assumed to be inversely proportional to the CPS sample size in that county.
There is ample evidence that the function that is now used to distribute the total sampling variance to counties is incorrect (see Chapter 6). The Census Bureau's experimentation with other functions (specifically, investigating a function in which the sampling variance is inversely proportional to the square root of the CPS sample size in a county–see Fisher and Asher, 1999a) should be pursued to eliminate or reduce the problem of variance heterogeneity with respect to both the CPS sample size and the predicted value of the number of poor school-age children that is evident in the county model regression output. Research on this topic should include an assessment of the effects of alternative variance functions on the county estimates.
In addition, the Census Bureau should continue to pursue an alternative approach, which is to estimate the CPS sampling variances for counties with adequate sample size on the basis of direct calculations of these variances that take account of the clustered sample design within these counties, and then use a generalized variance function for modeling the sampling variances for all counties with CPS-sampled households. With this approach, the model error variance is then obtained simultaneously with the regression parameter estimates through use of maximum likelihood estimation, as in the state model. The Census Bureau's work on fitting a generalized variance function to the CPS sampling variances should continue and should include an assessment of the effects on the county estimates to determine if the benefits justify continued refinement of the variance modeling.
Model Error and Sampling Error Variance in the State Model In the state model the model error variance is obtained from a maximum likelihood procedure that estimates the coefficients of the predictor variables and the model error variance, given estimates of the sampling error variances of the direct state estimates. For most years for which the state model has been estimated, this procedure estimates the model error variance as zero, which results in zero weight being given to the direct CPS estimates. In effect, the model is assumed to be without error, which is not credible. A likely explanation is that the Census Bureau's estimates of sampling error variance for the direct state estimates are
overestimates, which results in a value of zero for the model error variance when the state sampling variances are used in a maximum likelihood procedure that estimates the coefficients of the predictor variables and the model error variance. The Census Bureau should continue to investigate its procedures for estimating sampling error variance. It should also examine the effects of a simple correction, such as putting a small weight on the direct estimates in weighting the estimates from the CPS equation for a target year.3
State Effects The magnitude of the state raking factors that are used to adjust the county estimates warrants further investigation. Preliminary calculations by the panel suggest that sampling error may account for much, but not all, of the variation in the raking factors. The Census Bureau should conduct further research to better understand the causes of this variation. One part of this research could be to examine the effect of using 3 years rather than 1 year of CPS data in the state model, as is done in the county model.
More generally, work should be conducted to determine if there are idiosyncratic state effects that should be captured in the county model. The Census Bureau did some preliminary research on adding fixed state effects to alternative formulations of the county model (see Appendix A). While the addition of fixed state effects reduced some nonrandom residual patterns in the regression output, a fixed state effects model did not perform better than other models in comparison with the 1990 census estimates (see Appendix B and Appendix C). Some preliminary work with a random state effects model with two components of variance, one for state and one for county within state (see Fuller and Goyeneche, 1998), suggested that a small state random effect may be present and that further research on a random state effects model should be conducted.
Discrete Variable Models that Use Counties with No Sampled Poor School-Age Children When using a logarithmic transformation of the number of poor school-age children as the dependent variable in the county regression model, all counties in the CPS sample for which none of the sampled households has school-age children who are poor (262 of 1,247 counties for the 1995 model) have to be removed from the regression analysis. The dropped counties are generally smaller counties with small CPS sample sizes.
While the dropped counties would have little influence in any regression equation due to their small sample sizes, the exclusion of 21 percent of the counties in the CPS sample is a cause for concern. Moreover, the internal and
Bell (1999) has explored yet another approach, which is to use a Bayesian model to account for the uncertainty in the estimates of the model error variance. This approach yields positive estimates of model error variance that could be useful for producing the state model estimates.
external evaluations of the county model suggest that although the current approach provides reasonably good estimates for small counties for 1989, 1993, and 1995, they could be improved. For example, there is a slight tendency in the county model equation to overpredict poverty in small counties (see Chapter 6). It is important to investigate the development of discrete variable regression models, such as Poisson regression or other forms of generalized linear models, that permit the inclusion of data for those counties that have no sampled families with children in poverty. The Census Bureau has begun work on a hierarchical Bayesian modeling approach that addresses this problem, and this work should continue (see Fisher and Asher, 1999b).
Ways to Reduce the Time Lag of the Estimates The Title I fund allocations for the 1999-2000 school year were based on estimates of school-age children in 1996 who were in poor families in 1995, and these estimates were also used for the 2000-2001 school year allocations. It is important to explore the extent to which this time lag can be reduced for the county estimates, which will correspondingly reduce the time lag for the school district estimates.4 The Census Bureau began some exploratory work on this topic in June 1997 but had to put it aside. Now that the county estimation procedure has been developed and put on a production basis, it is important to resume this work.
One of the causes of the lag is the availability of food stamp data for counties, which must be obtained from individual states in some instances and which are not available until almost 2 years after the year to which they refer. It might be possible to overcome this problem, without seriously harming the performance of the county model, by using food stamp data for the year prior to the estimation year. Another possibility is to control the estimates from the county model to the state model estimates for the latest of the 3 years of CPS data used in the county model, instead of to the middle year. These ideas and others (see National Research Council, 2000:Ch.3) need to be evaluated to determine if the lag between the time period of the estimates and the year of allocation of funds can be reduced.
Evaluation of Food Stamp and Other Input Data Regular evaluation of the continued suitability of food stamp and other data for input to the state and county models is important for the Census Bureau's small-area estimation program. Changes in welfare programs and the accompanying data systems (especially those resulting from the 1996 Personal Responsibility and Work Opportunity Reconciliation Act) will almost certainly affect the comparability of food stamp
It would also be desirable to reduce the time lag in the school district boundary survey so that the allocations are made to current school districts. However, that survey is conducted every 2 years, and it may not be possible to carry it out more frequently or to complete it more quickly.
data over geographic areas. For example, legal immigrants, many of whom are no longer eligible for benefits, are very unevenly distributed geographically. Comparability is an important assumption in both the county and state regression models, and, therefore, the way in which food stamp data are used as a predictor variable in the models may need to be modified. Changes in the tax system could also affect the usefulness of IRS data for small-area poverty estimation. More generally, it is important to continually evaluate the input data to the state and county models to assess errors or inconsistencies in them and to develop methods to account for those errors in the modeling process.
Large Category Differences and Residual Patterns for the State and County Models The internal and external evaluations (see Chapter 6) demonstrated that the state and county models are generally well behaved with respect to the estimates for various categories of states and counties. However, it is important to investigate further the residual patterns and category differences to determine if the regression models could be improved either through a modification of the model form or through the addition of predictor variables.5
As an example of a pattern that is worth further investigation, when compared with CPS aggregate estimates, the county model exhibited a tendency in 1989, 1993, and 1995 to underpredict the number of poor school-age children in counties with large percentages of Hispanics. Also, from examination of the standardized residuals, the state model exhibited a tendency to underpredict the proportion of poor school-age children in some states in the West Region.
More generally, as a model is estimated for additional years, it is important to look for consistent patterns of residuals and category differences to understand their causes and to take corrective action when necessary. While it may be necessary to tolerate overprediction or underprediction for a particular type of area in any one year, a consistent pattern of overprediction or underprediction needs to be addressed.
In the evaluation of residuals and category differences, particular attention should be paid to states and counties that have experienced large demographic or socioeconomic changes that may correlate with changes in numbers of poor
The evaluations conducted to date of the county estimates include examination of the residual patterns from the regression model, comparisons of the model estimates for 1989 with 1990 census estimates, and comparisons of the model estimates for 1989, 1993, and 1995 with aggregate CPS estimates. Another evaluation that could help determine what portion of the errors in the county estimates is due to problems with the model–rather than measurement differences and sampling variability –is to fit the model to 1990 census data (prior to shrinkage and raking to the state model) and to compare the estimates to 1990 census values for aggregates of counties. This evaluation is similar to the county model-CPS aggregate comparisons, but it has the advantage that the sampling error in the census is much less than in the CPS. The county model estimates are not shrunk for this evaluation because the resulting estimates would have considerable weight on the census direct estimates and so be less informative about possible problems with the regression model.
school-age children. For example, the federal tax return data that are used to estimate internal migration for the postcensal population estimates might be used to classify states and counties into categories by migration rates and the performance of the models compared for these categories. Also, the performance of the models might be compared for categories of counties classified by overall population change since the 1990 census. In turn, adding predictor variables to the models from the decennial census and the population estimates program, possibly including interaction terms, may prove a fruitful way to address persistent patterns of overprediction or underprediction for these and other categories of states and counties.
School District Estimates
There cannot be marked improvements in the school district estimates without a substantial effort to improve the data sources for districts and to develop models to use them. Nonetheless, work should go forward to further evaluate the current estimation method and to seek to effect modest improvements in it. Three important areas for research are: investigation of methods to reduce the variance of the census estimates of poor school-age children; use of school enrollment data to improve estimates of the total number of school-age children; and investigation of the possible use of National School Lunch Program data to improve estimates of poor school-age children.
Reducing the Variance of the Census Estimates of Poor School-Age Children Because so many school districts are so small in size, the census estimates of poor school-age children, which derive from the long-form sample, are subject to high sampling variability. In addition to affecting the quality of the 1995 school district estimates that were developed by the Census Bureau's within-county shares method, the sampling variability in the 1990 census estimates affected the 1980-1990 evaluations. The evaluation measures reported in Chapter 7 overstate the degree of error in the within-county shares estimates because of this sampling variability. The Bureau should continue its research in partitioning out the sampling error from the root mean square difference between the within-county shares estimates and the census estimates of poor school-age children (see Chapter 7) in order to produce a better indicator of the quality of the school district estimates.
The 1990 census school district estimates of poor school-age children that were used in the 1995 estimates and as the standard of comparison in the 1980-1990 evaluations were developed by ratio adjustment. This procedure, which applies the long-form-sample-based estimates of the school-age poverty rate to the complete-count estimates of total school-age children, reduces the variance of the 1990 census estimates to a modest extent. Other ways to further reduce the variance should be investigated.
One approach is to incorporate other characteristics from the census short form that are known to be related to poverty in estimating school district numbers of poor school-age children from the census. For example, such characteristics as race and ethnicity, home tenure (owner, renter), family type, and residence (e.g., central city) could be used for this purpose. A very simple form of this type of estimation procedure would be a stratified ratio adjustment with strata defined using short-form information.
Another approach is to smooth the census school district estimates with the census county estimates. By carefully constructing smoothed school-district estimates as combinations of school-district and county-level estimates, it might be possible to produce school-district estimates with lower mean square errors than the direct census estimates. It would be desirable to make use of knowledge about model error and sampling variances at the school-district level—if available—to tailor the degree of smoothing for each school district. If successful, smoothing procedures might substantially improve the estimation of census school-age poverty rates in small school districts. They would add some bias because county poverty rates differ from poverty rates for school districts contained within them, but they could potentially substantially reduce variance, thereby improving mean square error.
The development of a smoothing approach should include a thorough evaluation. As part of that evaluation, it would be useful to compare 1990 census estimates of poor school-age children for school districts with three sets of estimates that differ in the calculation of 1980 census within-county shares that are applied to the 1989 county model estimates: unsmoothed 1980 census withincounty shares (as in method (1), see Chapter 7); smoothed 1980 census withincounty shares; and 1980 census within-county shares that use the 1980 census county school-age poverty rates for all school districts within each county. The third method represents a complete smoothing of the school district poverty rates within counties.
If one or both methods for reducing the variance of the census school district estimates of poor school-age children (smoothing and using other characteristics in the estimation) are successful, then the revised census estimates should be employed with the within-county shares approach if it is used again in the future. The revised estimates from the 1990 census should also be used as the standard of evaluation for assessing the within-county shares estimates of poor school-age children in 1989.
Use of School Enrollment Data to Improve Estimates of the Total Number of School-Age Children The method for estimating total school-age children is similar to that for estimating poor school-age children, namely, to apply census estimates of school district shares within each county to updated county estimates. The method is more robust for total school-age children (and total population) than for poor school-age children because the numbers being estimated
are larger and because the census shares for total school-age children (and total population) are based on complete-count data that are not subject to sampling error. But the within-county shares method still does not capture within-county changes in school district populations that have occurred since the census.
Public school enrollment data are collected annually by the National Center for Education Statistics (NCES) for school districts. The Census Bureau has begun research to determine if these data could be used to update the within-county school district shares of total school-age children. Part of this research is to examine reported school enrollment in the 1980 and 1990 censuses for school districts to determine if the within-county enrollment shares in 1990, or, alternatively, the changes in enrollment from 1980 to 1990, produce estimates of total school-age children that are more accurate for 1990 than the 1980 census-based shares. This work should continue. If it is successful, research would also be needed to evaluate the quality of the NCES enrollment data and to determine if such factors as changes in public versus private school enrollment present a problem for estimation.
If it is determined that the use of enrollment data would improve school district estimates of total school-age children, it will be necessary to modify the estimation procedure for poor school-age children so that the estimates of both groups (total and poor) are consistent. One way to achieve consistency would be to apply census school-age poverty rates for districts to the updated estimates of within-county shares of total school-age children that are developed from enrollment data.
Possible Use of School Lunch Data to Improve Estimates of Poor SchoolAge Children There are many reasons that school lunch data are not necessarily a good proxy for school-age poverty (see Chapter 7). Moreover, at present, there is no complete, accurate source of school lunch data by school district that is readily available to the Census Bureau. Nonetheless, approval to receive free meals under the National School Lunch Program is an indicator of low income, and it seems worthwhile to pursue for other states the research that the panel undertook for New York and Indiana (see also National Research Council, 2000:Ch.5).
The Census Bureau may be able to work through its state data centers for selected states to obtain school lunch data by district for 1989-1990 to evaluate whether within-county school lunch shares in 1989-1990 produce estimates of poor school-age children in 1989 that are more accurate than those produced from the 1980 census-based shares. Another approach to evaluate is whether a combination of school lunch data and census data would be preferable to using either data source alone. The research should also look at the effects of using school lunch data, solely or in combination with census data, to estimate schoolage poverty rates because of the role that rates play in concentration grants. If the results of such research are promising, it would be necessary for the NCES to
improve the reporting of participation in the National School Lunch Program that it collects in the Common Core of Data.
DOCUMENTATION AND EVALUATION
The development of small-area estimates of income and poverty is a major effort that includes data acquisition and review, database development, geographic mapping and geocoding of data, methodological research, model development and testing, and documentation and evaluation of procedures and outputs. Since the production of small-area poverty estimates supports a range of important public policies for federal, state, and local governments—including the allocation of funds—it is essential that the Census Bureau have adequate staff and other resources for all components of the estimation program, including evaluation and documentation. It is the responsibility of any agency that produces model-based estimates to conduct a thorough assessment of them, including internal and external evaluations of alternative model formulations.
An integral part of the evaluation effort is the preparation of detailed documentation of the modeling procedures and evaluation results. No small-area estimates should be published without full documentation. Such documentation is needed for analysts both inside and outside the Census Bureau to judge the quality of the estimates and to identify areas for research and development to improve the estimates in future years.
Users of small-area estimates of income and poverty from the Census Bureau's SAIPE Program for fund allocation or other purposes should carefully review the documentation provided by the Bureau to understand the properties of the estimates. Users should also study the effects of using the estimates for allocations (see National Research Council, 2000:Ch.6, 7). The Committee on National Statistics is planning to conduct more work in this area. With the participation of our panel, it held a workshop in spring 2000 on issues in using estimates for fund allocation, and a more intensive study of the interactions of properties of estimates with features of funding formulas began in fall 2000. We believe such an effort can usefully inform both users and producers of small-area estimates.