6
Future Research and Development for County Estimates

It is important to continue a significant program of research and development for methods of estimating poverty for school-age children at the county level for three reasons. First, although there is clear justification for using the revised 1993 estimates of poor school-age children for Title I allocations for the 1998-1999 school year, the county (and state) model evaluations identified issues that warrant investigation to determine how to further improve the estimation procedures. Second, with a model-based approach, it is important to examine carefully the continued applicability of a model each time it is used. Third, research is needed to take account of likely future developments in the availability of data that have implications for the modeling effort.

For the immediate future, a pressing requirement for the Census Bureau is to produce poverty estimates for school districts. The task of developing updated estimates of poor school-age children for school districts is challenging in many respects. Just some of the factors that complicate the work (see Siegel, 1997) are the scarcity of relevant data (e.g., IRS and food stamp program data are not currently available for school districts); the small size of many school districts (66% of the 15,227 school districts in 1989-1990 had a 1990 census total population of less than 10,000); the variations among states in the ways in which school districts are defined (e.g., 26% of 1989-1990 school districts included certain grade levels only, and 27% of 1989-1990 school district boundaries crossed county lines); and the changes in school district boundaries that occur over time.

The panel looks forward to working with the Census Bureau on school district estimation, but specific methods for developing school district estimates of poor school-age children are not considered further in this chapter. Because



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations 6 Future Research and Development for County Estimates It is important to continue a significant program of research and development for methods of estimating poverty for school-age children at the county level for three reasons. First, although there is clear justification for using the revised 1993 estimates of poor school-age children for Title I allocations for the 1998-1999 school year, the county (and state) model evaluations identified issues that warrant investigation to determine how to further improve the estimation procedures. Second, with a model-based approach, it is important to examine carefully the continued applicability of a model each time it is used. Third, research is needed to take account of likely future developments in the availability of data that have implications for the modeling effort. For the immediate future, a pressing requirement for the Census Bureau is to produce poverty estimates for school districts. The task of developing updated estimates of poor school-age children for school districts is challenging in many respects. Just some of the factors that complicate the work (see Siegel, 1997) are the scarcity of relevant data (e.g., IRS and food stamp program data are not currently available for school districts); the small size of many school districts (66% of the 15,227 school districts in 1989-1990 had a 1990 census total population of less than 10,000); the variations among states in the ways in which school districts are defined (e.g., 26% of 1989-1990 school districts included certain grade levels only, and 27% of 1989-1990 school district boundaries crossed county lines); and the changes in school district boundaries that occur over time. The panel looks forward to working with the Census Bureau on school district estimation, but specific methods for developing school district estimates of poor school-age children are not considered further in this chapter. Because

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations improvements to the county model are likely to have important benefits for school district estimates, the research discussed below is also relevant for school district estimation. OVERVIEW OF RESEARCH NEEDS Setting priorities for research on the county model should take account of the production schedule for updated small-area estimates of poor school-age children that are required, directly or indirectly, by the Improving America's Schools Act of 1994. The deadlines for the Census Bureau to deliver small-area estimates of poor school-age children to the Department of Education for use in Title I allocations are as follows for 1998-2002: • May 1998 deliver county estimates for 1995 • October 1998 deliver school district estimates for 1995 • October 2000 deliver school district estimates for 1997 • October 2002 deliver school district estimates for 1999 The 1995 county estimates to be delivered in May 1998 will include total and poor school-age children. The 1995, 1997, and 1999 school district estimates to be delivered subsequently will include total and poor school-age children, as well as total population.1 Although county estimates are not required for future years by the legislation, they are likely to be needed in order to have a system of estimates that is consistent for different levels of geography. Just as estimates from the state model are currently used to control the estimates from the county model, so it is likely that, for the foreseeable future, estimates from the county model will play an important role in producing estimates for school districts. Also, there is likely to be interest in state and county estimates of poor children for other important public policy uses, such as evaluating the effects of changes in welfare programs. Research on methods to improve the 1995 county estimates of poor school-age children must be relatively straightforward, given the May 1998 deadline. Longer term research will be useful for improving the county estimates in connection with the school district estimates to be delivered in October 2000 and later. Priorities for longer term research should consider the important changes that are likely to occur in the availability of data for modeling over the 1998-2002 period and beyond, which include: 1   Total population estimates are needed because the legislation includes a provision that Title I allocations for school districts that have less than 20,000 total population in a state may be aggregated, and the state may then reallocate the funds to school districts on the basis of data other than the Census Bureau's estimates.

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations current and future changes to welfare programs and tax systems that may affect the comparability of IRS and food stamp program data; the income and poverty estimates for small areas that will be available from the 2000 decennial census long-form sample of about 17 million households (such estimates will likely be available in 2002 for counties but not until later for school districts); and the planned introduction of the American Community Survey (ACS) as a large-scale, continuing sample survey of U.S. households, conducted primarily by mail, that will provide estimates similar to those provided by the decennial census long-form sample, including income and poverty estimates for small areas. The ACS is currently being tested in 4 sites; it will be implemented in 40 sites in 1999-2001 for comparison with the 2000 census. For each year from 2000 to 2002, the ACS will sample about 70,000 households nationwide. Beginning in 2003, the ACS will sample 250,000 households each month throughout the decade, for an annual sample size of about 3 million households (see Alexander, Dahl, and Weidman, 1997). Changes in welfare programs and the accompanying data systems (especially those resulting from the 1996 Personal Responsibility and Work Opportunity Reconciliation Act) will almost certainly affect the comparability of food stamp data over geographic areas. For example, legal immigrants, who are no longer eligible for benefits, are very unevenly distributed geographically. Comparability is an important assumption in both the county and state regression models, and, therefore, the way in which food stamp program data are used as a predictor variable in the models may need to be modified. Changes in the tax system could also affect the usefulness of IRS data for small-area poverty estimation. The American Community Survey, when it is fully operational, will be an important component of any approach to providing updated estimates of poor school-age children for small areas. It is possible that several months (or years) of data from the ACS might be used to provide direct estimates of poor school-age children for small areas. Alternatively, ACS data could be used indirectly as a dependent variable in a model-based approach for counties and school districts, parallel to the manner in which the CPS data are currently used for counties. However, given that each year of the CPS and the 2000 census will also provide information on poverty,2 it will be important to find ways to use all three sources of information together, for multiple time periods (for the CPS and ACS), to produce the best small-area estimates. Furthermore, given that all three data 2   If the ACS is implemented as planned, it is likely that the 2010 census and subsequent censuses will not include a long form and, hence, will not provide income and poverty information.

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations sources will have their own measurement biases3 and that they are available for different time periods—the decennial census year, multiple years of the CPS March Income Supplement, and many months of the American Community Survey—it is unlikely that simply pooling estimates from the three data sources can be justified. Hence, some adjustment or modeling procedure will be needed. Such a procedure will have to take account of available information about the variances and biases of the estimates from each data source (including not only measurement errors, but also bias because a data source covered a different time period). Continued research and development for measurement error and time-series models will be needed to develop effective multivariate models for small-area poverty estimates that use multiple data sources for multiple time periods. A specific research issue is to determine how best to use the 2000 census information, which has low sampling variance but possibly substantial measurement bias and which may be biased if the economic conditions during the census reference period differ markedly from the period for which estimates are needed. How best to combine information from disparate data sources is a general problem that has received considerable attention recently in statistical research. To prepare for future improvements in small-area estimates of poverty, research should start now on combining census and CPS data. Research should begin on combining data from all three data sources as soon as sufficient data are available from the American Community Survey. The research required to take account of changes in data sources, as well as some of the other research recommended by the panel, is time consuming and will likely require additions to the staff who are currently working on small-area income and poverty estimation at the Census Bureau. More generally, the production of small-area estimates is a major effort that involves data acquisition and review, database development, geographic mapping and geocoding of data, methodological research, model development and testing, and evaluation. Since the production of small-area poverty estimates supports a range of important public policy needs for federal, state, and local governments, it is essential that adequate staff and other resources be available for the estimation program. The Census Bureau should consider ways of augmenting staff resources by engaging experts from outside the Census Bureau, by making use of contracts, interagency personnel transfer agreements, the research fellowship program of the Bureau and the American Statistical Association, and other arrangements. 3   The data collection methods for the census, CPS, and ACS differ in many respects, including the length of the questionnaire, the primary data collection technique (face-to-face interviews versus mail questionnaires), the definitions of variables, the reference period for income measurement, and editing and imputation methods. Any of these differences can lead to different measurement biases.

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations SHORT-TERM RESEARCH PRIORITIES FOR THE COUNTY MODEL The panel identified four types of research that should be pursued to determine if the current estimation procedure can be improved before the next delivery of updated county estimates of poor school-age children in 1995, scheduled for May 1998. These four areas are discussed further below: generalized variance function modeling of CPS sampling variances for counties; further investigation of the state model; further research and development for models that incorporate state effects in the county model; and further examination of factors associated with large category differences and characteristics of outlier counties. All of this work appears to be feasible in the short term. If it turns out that this research leads to one or more changes in the county model, then the kinds of internal and external evaluations conducted for the four models that were candidates to produce revised 1993 county estimates of poor school-age children (see Chapter 4) should be repeated for the new model and close competitors, to ensure that the changes to the model have not introduced any new problems. Even if the model remains unchanged, there is still a need to conduct a full internal evaluation of the 1995 county model because it will use different data than the 1993 county model. Generally, evaluation work should be a regular part of the development of updated county and school district estimates of poor school-age children. Generalized Variance Function Modeling of CPS County Sampling Variances The total squared error, or residual variance, for the revised county model—log number (under 18) model (b)—comprises two components, the model error variance and the sampling variance of the dependent variable. These two components need to be estimated separately for the application of the model, particularly for determining the relative weights of the regression estimate and the direct estimate in the shrinkage procedure. The current approach for estimating these components is to assume that the model error variance from the 1989 regression equation with the dependent variable formed from 1990 census data is the same as the model error variance formed from 1993-1995 CPS data. The sampling variance is then obtained by subtraction from the total squared error. The sampling variance for a particular county is assumed to be inversely proportional to the CPS sample size in that county. An alternative approach is to estimate the CPS sampling variances on the basis of direct calculations of these variances that take account of the clustered sample design within counties, and then use a generalized variance function for modeling the sampling variances. With this approach, the model variance is calculated by subtracting the sampling variance from the total squared error. It thus avoids the questionable assumption that the model variances for the 1989 census equation and the 1993 CPS equation are equal.

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations Census Bureau staff have already begun work on fitting a generalized variance function to the CPS sampling variances. This work has a number of benefits. It should lead to improved relative weights for use in the shrinkage estimation, although this is likely to have only a modest impact on the final estimates. It may reduce, or help explain, the variance heterogeneity of the standardized residuals from the county model as a function of the poverty rate and the CPS sample size; in particular, it may address the pattern of increasing standardized residual variances with CPS sample size. The work also may shed some light on why the methods of moments and maximum likelihood produce different estimates of the sampling variances with the current approach. Further Investigation of the State Model There are a number of research issues that remain concerning the state model. First, the possible bias of the county estimates in the Pacific Division (see Chapter 4), which is due to raking the county estimates to the state model, should be investigated. If the overprediction for the Pacific Division is due to CPS-census measurement differences or to sampling variation in the CPS for 1989, it is not necessary to make any adjustments in the model. However, if CPS-census differences or sampling variation are not the cause, work is needed to improve the state model. Second, the current state model uses only one year of CPS data. It could be beneficial to use the information on poverty at the state level that is contained in the CPS samples for previous years. Multivariate modeling is one approach that may be used to incorporate CPS data from other years. Work along these lines has been initiated at the Census Bureau (see Otto and Bell, 1997). Finally, the detailed evaluations that were carded out for alternative county models have not been carried out to the same extent for the state model. Additional evaluations could include examination of regression output for other years of the state model and external validation vis-à-vis the 1990 census for alternative versions of the state model, including comparisons for categories of states to the extent feasible. Further Research and Development for County Models that Incorporate State Effects The magnitude of the state raking factors for the county estimates is of concern to some panel members (see Chapter 4). When the sums of county estimates often diverge substantially from the corresponding state estimates, it is important to understand why. To the extent that the variation in the raking factors is due to problems of the county or state model—for example, if there are idiosyncratic state effects that the county model does not capture—then it may be possible to improve the county estimates by modifying one or both models. In particular, there could be substantial benefits from a county model that incorporates state effects, possibly through the use of a fixed or random state effects formulation. The Census Bureau did some preliminary research on adding fixed state

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations effects to alternative formulations of the county model (see Appendix A). The fixed state effects models had raking factors that varied even more widely than the factors for other models. Also, while the addition of fixed state effects reduced some nonrandom residual patterns in the regression output, a fixed state effects log number model (under 21) did not perform better than other models in comparison with the 1990 census estimates (see Appendices C and D). It is important to study more thoroughly the discrepancies between the state and county models and to try out various methods for incorporating state effects in the county model in a more integrated way, such as through a two-level nested model.4 Another part of this work could be to examine the effect of using a single year of CPS data for the state model and 3 years of CPS data for the county model. Further Examination of Factors Associated with Large Category Differences and Characteristics of Outlier CountiesThe internal and external evaluations (see Chapter 4) demonstrated that the log number (under 18) model (b) was generally reasonably well behaved with respect to the estimates for various categories of counties. However, some of the residual patterns and category differences from the 1990 census are worth investigating further to determine if the regression model could be improved either through a modification of the model form or through the addition of predictor variables. For example, the standardized residuals of model (b) exhibited some unusual behavior for urban counties and for counties with a high percentage of Hispanics, and this was also the case with the other three candidate models. More generally, it is important to consider any anomalies in the model output (such as the variation in the state raking factors, discussed above), to understand their cause and to take corrective action when necessary. In that regard, it is somewhat surprising that the four candidate models, which are very similar, are so sensitive to relatively minor changes in specification. Two examples are the large change in the estimated sampling variance that resulted from fitting the models with maximum likelihood estimation instead of the method of moments, and the benefits of using the estimated population under 18 rather than under 21 as a predictor variable for the log number model (b) but not the log rate model (d). To investigate these kinds of anomalies, it would be useful to explore the data set using graphical analysis tools to further assess if there are county clusters that behave unusually. Finally, some of the category differences in the county models could be due to differences between the CPS and census measurements of poverty. Some 4   One approach is to estimate two components of variance, one for state and one for county within state, for the model fitted to the 1990 census data. An exploratory analysis of state effects can be conducted by estimating a state variance component, using the residuals from the model fitted to the CPS data.

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations work has been done to understand these differences (see Chapter 4), but more could be done to compare direct census and CPS estimates for different geographic and socioeconomic subgroups of the population. Much more powerful analyses of CPS-census differences could have been obtained had a March 1990 CPS-1990 census match been conducted (as had been done in previous decades). From such a match, CPS and census responses for the same households could be compared. Looking to the future, it would be wise to plan now for an exact match of the 2000 census and the March 2000 CPS. Given that the American Community Survey will be in the field in 2000 with a national sample of about 70,000 households, an exact match could also be performed of the ACS in that year and the 2000 census. These matches would provide a wealth of information about the three different income measurement systems and would also provide key inputs to the development of a CPS-census measurement error model. Such a model could help resolve remaining issues about the differences between the state and county models (e.g., the overprediction of the number of poor school-age children in the Pacific Division). Such a model could also provide information from which to determine how to use data from the 2000 census with the currently employed CPS-based estimation procedure to minimize discontinuities in the Title I fund allocations that are based on estimates for income year 1999. LONGER TERM RESEARCH AND DEVELOPMENT FOR THE COUNTY MODEL In the medium and longer term, research on some other areas could likely result in improvements to the county model, perhaps as early as the October 2000 release of estimates of poor school-age children for 1997. Four such longer term research areas are multivariate approaches to county estimation; investigation of models that make use of all the counties in the CPS sample, including those with no sampled poor school-age children; examination of ways to reduce the time lag in the estimates; and improvements in small-area population estimates for school-age children. Multivariate Approaches to County-Level Estimation The Census Bureau proposed, as an alternative to the separate use of CPS and census county regression equations (with the census equation being used only to estimate the model error variance for the CPS model), a bivariate county regression model, in which the two dependent variables are the CPS and census estimates of poor school-age children. This formulation has some very real advantages. First, the internal evaluation of the regression output for the bivariate models indicates that they are as good as or possibly better than their single-equation analogues. Also, additional analysis of the bivariate and single-equation formulations showed the benefit of the bivariate approach (see Appendix C). Unfortunately, lack of adminis-

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations trative records data for 1979 prevented the Census Bureau from conducting an external evaluation of the bivariate models, and, therefore, given their novelty and relative lack of evaluation, the panel did not recommend them for the production of the revised 1993 county estimates of poor school-age children. Research into this approach should continue, including an external evaluation as soon as that is feasible using the 2000 census data. Similarly, integrating multiple years of the March Income Supplement of the CPS into the estimation procedure by means of a multivariate model, as opposed to the current procedure of averaging the data for several years, may be advantageous. A multivariate model, with estimates from more than one CPS year and the census as dependent variables in a linear system of equations, might provide an effective way of using all of the available information. In the future this model could also incorporate data from the American Community Survey by adding equations for the estimates from that survey.5 Investigation of Discrete Variable Models that Use Counties with No Sampled Poor School-Age ChildrenWhen using a logarithmic transformation of the number or proportion of poor school-age children as the dependent variable in a regression model, all counties in the CPS sample for which none of the sampled households has poor school-age children—304 of 1,488 counties for the 1993 model—have to be removed from the regression analysis (see Chapter 2). The dropped counties are generally smaller counties with small CPS sample sizes. Although there are reasons to believe that the current approach provides reasonable estimates for 1989 and 1993,6 the exclusion of 20 percent of the counties in the CPS sample is a cause of concern. Also, the consistency of modeling for small and large counties observed for 1989 and 1993 may not characterize future years for which the county model is estimated. It is important to investigate the development of discrete variable regression models, such as Poisson regression or other forms of generalized linear models, that permit the inclusion of data for those counties that have no sampled families with children in poverty. Although a satisfactory approach may not be fully developed and tested by May 1998, when the next round of county estimates is to be completed, work should begin now on this topic. 5   An alternative approach to a bivariate or multivariate model that could be investigated is a single-equation model in which estimates from more than one census and CPS year are included as predictor variables. 6   Graphical analysis by the Census Bureau suggests that smaller counties are fairly well fit by the regression equation for both years. Also, comparisons with 1990 census estimates (see Chapter 4) show that the counties not in the CPS sample, which are typically smaller than counties in the sample, are well predicted by the county model. Finally, since the counties in the CPS sample that are excluded from the regression estimation generally have small samples, they would have less influence in any regression analysis.

OCR for page 83
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations There are two factors that complicate the development of discrete variable models in this context: the lack of fully developed shrinkage procedures for most models of this form and the treatment of CPS sampling variances. However, Markov chain Monte Carlo implementation of hierarchical models can be used to address the first issue, and, with additional research and development, can also probably address the second issue. Examination of Ways to Reduce the Time Lag of the Estimates The Title I fund allocations for the 1997-1998 school year were based on estimates of poor school-age children in 1993, and these estimates will also be used for the 1998-1999 school year allocations. It is important to explore the extent to which this time lag can be reduced. (The Census Bureau began some exploratory work on this topic in June 1997 but had to put it aside in order to complete the evaluations of the original 1993 county model and alternative models.) One of the causes of the lag is the availability of food stamp data, which must be obtained from individual states in some instances and which are not available until almost 2 years after the year to which they refer. It might be possible to overcome this problem, without seriously harming the model performance, by using food stamp data for the year prior to the estimation year. Another possibility could be to control the estimates from the county model to the state model estimates for the latest of the 3 years of CPS data used in the county model, instead of to the middle year. These ideas and others need to be evaluated to determine if the lag between the time period of the estimates and the year of allocation of funds can be reduced. Improvements in Small-Area Population EstimatesThe Census Bureau has work under way, which should continue, to improve the procedures for estimating the population by age at the county level and to develop estimates of the total population and the school-age population for school districts. In addition, the current approach to producing population and poverty estimates for school-age children through separate estimation procedures may not make full use of the obvious correlation that the two kinds of estimates should have. Therefore, it would be useful to explore the possibility of a bivariate model of population and poverty for school-age children or of other methods that more fully integrate the development of these two sets of estimates.