Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 25
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations 4 Panel Assessment of the Methodology Any set of model-based estimates requires thorough evaluation of the assumptions underlying the model, the quality of the input data, the variability of the resulting estimates, and other features of the estimation procedure. For the purposes of Title I allocations, the primary concern is with the quality of the estimates of poverty among school-age children for counties. Thus, the discussion in this section focuses largely on the county-level model, but it also considers the state-level model and the Census Bureau's population estimates, both of which enter into the final county estimates. The ideal evaluation of the Census Bureau's (or any) methodology for estimating the number of poor school-age children for counties would start by comparing 1993 estimates from the model-based procedure to the true numbers of poor school-age children for some or all counties in 1993 (or, at least, to measurements known to be highly accurate). One could then determine how close the estimates are to the "true" values. Unfortunately, the truth is not known, and no measurements known to be highly accurate are available. Because the ideal evaluation is impossible, the Census Bureau and the panel have approached the problem of evaluation from a number of different directions. Although no single evaluation is conclusive, the various evaluations have enabled us to form preliminary conclusions, which serve as the basis of our recommendation, about the degree of confidence we can have in various parts of the estimation procedure and the final product. The development of model-based estimates for counties is a major research effort for which extensive evaluation is required. Our conclusions are preliminary because the Census Bureau has not yet had time to conduct all of the assessments that the panel believes are necessary to fully evaluate the quality of the estimates and the suitability of the selected model.
OCR for page 26
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations The assessments that should be done include additional evaluations of the current procedures, the development of other competing models, and comparisons of the performance of the Census Bureau's models (both county-and state-level models) with other, similar models. Such models include a county-level model that predicts rates instead of numbers of school-age children in poverty; models that predict change in poverty over time or that use change-related predictor variables (e.g., changes in the number or proportion of child exemptions reported by families in poverty on tax returns); models that include additional predictor variables constructed from the available data; and models that allow a more flexible approach by using such statistical estimation procedures as generalized linear modeling. The Census Bureau has begun work to estimate and evaluate some alternative models, but the work is incomplete (see further discussion in Section 6). In the rest of this section, we present a nontechnical summary of the evaluations of the county-level and state-level models that have been conducted to date (see also Appendix E; Coder et al., 1996). We first consider several approaches to statistical evaluation of the county-and state-level models, beginning with consideration of the reasonableness of the form (specification) of the models used—both the variables included in the models and the mathematical formulas that are used to express the relationships among the variables. We next consider standard statistical tests to show the significance of the relationships between the predictor variables used in the model and the dependent variable. We also consider the relationship between the state-and county-level models. We describe the evidence on the performance of the Census Bureau's estimation procedure when used to estimate school-age children in poverty in 1989 instead of 1993. We call that the "1989 model" to distinguish it from the "1993 model" that estimates school-age children in poverty in 1993. (The estimation procedure is applied to 1989 because the latest available decennial census poverty estimates pertain to 1989 and can be used as a standard of comparison.) We also examine the presence of systematic under-or overestimation for particular groups of counties. We then consider the implications for the estimates of the fact that the March CPS and the decennial census represent somewhat different approaches to measuring poverty, using the same definition but not the same data collection or estimation procedures. Lastly, we consider the reliability of the postcensal estimates of population that are used in the estimation process. MODEL SPECIFICATION The use of a model-based approach to estimation is one the panel fully supports. Because of the nature of the available data, there is no alternative at this time to the use of models to develop poverty estimates for intercensal years. The decision to use a weighted combination of model and direct estimates for each area is widely accepted as the appropriate practice for small-area estimation prob-
OCR for page 27
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations lems of this type (see, e.g., Platek et al., 1987). However, several of the more detailed decisions about the Census Bureau's modeling strategy are not necessarily widely accepted or the best possible choices. We find the specification decisions made in the state-level model relatively straightforward: to use certain predictor variables and not others, to derive the dependent variable from 1 year (not 3 years) of March CPS data, to use a linear model, to express the model in terms of rates (or ratios) rather than absolute numbers, and to use a particular model to smooth the variance estimates for state data. However, it would nonetheless be useful to develop some alternative models and compare them with the selected model. We were interested in whether a model in which all variables were entered as changes from decennial census year to model year would be more accurate. We believe, however, that the effects that would be captured by this approach are at least in part represented in the Census Bureau's model through inclusion of the residual from a 1989 model as a predictor in the 1993 model. We have more concerns about the specification of the county-level model. An early decision was made to specify this model in terms of the number of school-age children in poverty, rather than in terms of a poverty rate or ratio. The expressed rationale for this decision is that, although there are postcensal estimates of the school-age population by county with which to connect the estimated rates to estimated numbers of poor school-age children, there are no variance estimates for these population estimates (Coder et al., 1996). The consequence of this decision is that changes in the number of children in poverty due to changing poverty rates and due to changing overall population growth (or decline) are all captured in the same regression model; postcensal population estimates affect the estimates only as a predictor variable in the model, just like the number of food stamp recipients and other variables. We are not confident that this approach gives the most precise estimates for counties. The decision to specify the model in terms of numbers also necessitated the use of a loglinear model, that is, one in which all variables are transformed to a logarithmic scale and the relationship between these transformed variables is assumed to be linear. The properties of this model are more complicated than those of the linear model used at the state level. In particular, as noted above, a number of counties represented in the CPS had no children in poverty in the sample households; these counties were simply deleted from the regression computation, as their data could not be transformed to the logarithmic scale.1 Although some calculations suggest that the magnitude of the effect of these deletions on the estimates may not be very large, we would have more confidence in a modeling strategy that did not require this exclusion of part of the data. Statistical methods that avoid this problem are available and preferred. 1 Of the 1,529 counties in the March 1993, 1994, and 1995 CPS data used for the 1993 model, 345 were deleted for this reason; most of these counties had very small samples.
OCR for page 28
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations There are two other major concerns about the specification of the county-level model (analyzed in more detail in Appendix E).2 One is that the model is almost entirely cross-sectional, making no use of variables that measure changes in poverty or program participation since the decennial census base year. The rationale for this specification is that the administrative data used in the model are more consistently measured across different areas than across time. It is certainly true that changes in tax and transfer program rules will affect the comparability of administrative data over time, but differences in program participation rates and administration may affect comparability across areas.3 Although the quantitative evaluation summaries we have seen of variations of the Census Bureau's basic model did not demonstrate superior accuracy for models that used more change variables, the fact that a type of change variable was important in the state-level model makes us believe that it would have been helpful in some form in the county-level model as well. In addition, a 1989 model was deemed essential for evaluation purposes, but it was not possible to fit a change model for that year because of the lack of historical data on some key variables. Another concern is that the county-level model does not scale up uniformly with county size: for example, if one doubles the size of the county while keeping its composition the same, so that the population estimates, the number of food stamp recipients, the number of child exemptions reported by families in poverty on tax returns, and the other count variables are all doubled, the predicted number of children in poverty more than doubles. This effect is quite substantial over the wide range of sizes of county populations. This feature of the model may reflect some real trends with respect to county size (see Appendix E). Thus, large counties may tend to have some characteristics, not included in the model, that are associated with higher poverty rates than are predicted from food stamp, income tax return, and prior census poverty estimates alone. What these variables might be is not known, and it may be difficult to include additional variables in the model, given the lack of suitable data from other sources. However, some work could be done to analyze the characteristics of counties for which a model that scales up uniformly with county size (achieved by constraining the coefficients of the predictor variables to sum to 1) produces better or worse estimates than the Census Bureau's unconstrained model.4 It is possible that such work would 2 A more minor concern is that the variables in the county-level model are highly correlated, which can be a problem given that the county-level model does not use a representative set of counties from the CPS (see Appendix E). 3 Such differences may become more prevalent in the future with increasing devolution of program responsibility to the states. 4 The Census Bureau's unconstrained model and a constrained model for 1989 were estimated and the resulting estimates compared with 1990 census estimates of the number of school-age children in poverty as part of a broader evaluation for 1989 (see below; see also Appendix E). However, there has been no analysis as yet of types of counties for which the constrained model performed better or worse.
OCR for page 29
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations identify additional variables that it would be useful to include, such as interactions of variables already in the model or characteristics that could be obtained from census data—such as whether the county experienced high population growth or includes a central city. In summary, the panel is relatively satisfied with the specification of the state-level model, although it should be further evaluated. We are less satisfied with and would have liked to pursue more alternatives for the county-level model. In fairness to the county-level effort, we recognize that the availability and accuracy of data at the county level are limited in comparison with what are available for states, while the degree of variation to be explained for counties is greater. It is therefore not surprising that the specification of the model at this level is more challenging and controversial. EVALUATIONS OF MODEL SPECIFICATION Formal statistical testing to assess the significance of the predictor variables in the models has been complicated by the complexity of the components of the county-level model. A formal hypothesis test performed for the state-level model (Fay, 1996) supported the conclusion that the state-level regression model using administrative records data improved on the estimates of poverty rates for 1993 that could be obtained by using only 1989 poverty rates from the decennial census.5 Thus, his test provided evidence to support basing estimates on a statistical model rather than on decennial census data. The Census Bureau also conducted statistical hypothesis tests to show the statistical significance of the predictor variables in the county-level model, but the same type of hypothesis test that was used for the state-level model to demonstrate the superiority of the model-based estimates was not performed. CONSISTENCY OF STATE-AND COUNTY-LEVEL MODELS The panel compared estimates of the number of school-age children in poverty by state that were obtained directly from the state-level model to those obtained by adding within each state the estimates from the county-level model before they were calibrated to match the state estimates. If the ratio of state estimates from the state-level model to state estimates aggregated from the county-level model were the same in every state, this result would indicate that the county-level model captures all of the effects captured by the state-level model, making the latter model superfluous. We did not find this result. Rather, 5 The test assumes that the objective is to predict poverty rates that reflect the CPS measurement of poverty and not the decennial census measurement. As discussed in Section 2, the CPS and census do not provide the same measures of poverty because of differences in data collection methods and other features (see also Appendix B).
OCR for page 30
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations we found substantial variability in this ratio, from 0.822 to 1.332, with one-half the states lying between 1.017 and 1.128 (see also Table E-1).6 This finding suggests that there may be substantial state effects that are not captured by the county-level model. We would like to determine whether a county-level model that also incorporates state effects will give substantially different estimates from those obtained from the present two-stage approach, but the data and analysis are not now available for such a determination. More generally, we would like to investigate the possible benefits (including evaluation opportunities) of approaches that provide for greater consistency in the specification of the state-and county-level models (e.g., in the predictor variables that are included and in the specification of the dependent variable). ACCURACY OF 1989 PREDICTIONS Although one cannot compare 1993 model-based county estimates to any measure of ''truth" for 1993, one can make a comparison for 1989 by estimating the model for 1989 and comparing the results with the 1990 census. One can also compare the performance of the 1989 model estimates in predicting the 1990 census with the performance of simpler models that rely entirely, or much more heavily, on data from the 1980 census. This comparison is relevant because, in the absence of model-based estimates, the estimates from the last census have been used for Title I allocations. Such a comparison can also help clarify the advantages and disadvantages of the Census Bureau's model. We compared county estimates from the 1990 census of the number of school-age children in 1990 who were poor in 1989 with county estimates for 1989 from four models, described below. Because Title I funds are distributed from a fixed budget and therefore are largely driven by the share of the national total of poor school-age children in each county (rather than the absolute number of poor school-age children in each county), we adjusted the counts from each model proportionally to match the 1990 census estimated national total number of school-age children in poverty in 1989. Model 1 is the Census Bureau's county-level model estimated for 1989. In this model, the dependent variable is the 3-year centered average of poor school-age children from the March 1989, 1990, and 1991 CPS (for income years 1988, 1989, and 1990), and the predictor variables are the 1989 counts of food stamp 6 Our comparison of the state estimates from the state-and county-level models for 1993 was performed with data that were made available to the panel in January 1997. Subsequently, the Census Bureau discovered errors in the input data for a few counties that somewhat changed the 1993 estimates from the county-level model. (The state estimates were unchanged.) However, the general findings hold true that the use of control totals from the state-level model results in large adjustments to the estimates for counties in some states and that the adjustments vary widely across states.
OCR for page 31
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations recipients, the 1989 estimates of child exemptions reported by families in poverty on tax returns and total child exemptions reported on tax returns, the 1990 census estimates of the population under age 21, and the 1980 census estimates of the number of poor school-age children. The county estimates are controlled to state estimates from the Census Bureau's state-level model for 1989. Model 2 is the 1989 county-level model (Model 1) that is not controlled to state totals. Model 3 is a model in which the 1980 census estimates of the number of poor school-age children are updated to reflect the change in the total number of school-age children in each county from 1980 to 1990. Model 4 is a model in which the 1980 census estimates (Model 3) are used without any updating (except, as in the other methods, to adjust all of the estimates proportionally so that they equal the 1990 census national total estimate of poor school-age children). In other words, Model 4 assumes that the distribution of poor school-age children by county did not change between 1980 and 1990. As a summary measure of accuracy, we used the average of the absolute differences between estimates from each model and the 1990 census estimates of school-age children in poverty by county for 1989. Our comparison of Model 4 (the 1980 census estimates) with the 1990 census gave an average absolute error of 543 school-age children in poverty per county, which is almost one-fourth of the 1990 census average estimate of about 2,400 school-age children in poverty per county in 1989. When we used Model 3 (updating for the change in county population), the average absolute error was 415. Using Model 2 (the 1989 county-level model without controls to state totals), the average error was 292. The prediction from Model 1 (full Census Bureau 1989 county-level model) had an average error of 270. These results suggest that updating the previous census for population shifts improves the estimates (comparing Model 3 with Model 4), but that the greatest benefit comes from the use of the county-level model (Model 1), which captures both population shifts and changes in the incidence of poverty.7 These results provide evidence in favor of the model-based approach and against using the estimates from the previous census. However, we note that such a comparison might turn out differently if one had a measure of truth and could perform the comparison for 1993 instead of 19898 because the shorter interval (4 7 Our analysis of estimates of the number of school-age children who were in poverty in 1989 (described here and the next part of Section 4 and in Appendix E) was performed with data that were made available to the panel in fall 1996. Subsequently, the Census Bureau discovered errors in the input data for a few counties that somewhat changed the estimates from its 1989 county-level model. However, the general findings continue to hold. 8 In this instance, Models 1 and 2 would represent the Census Bureau's 1993 county-level model adjusted and not adjusted, respectively, to the estimates from the 1993 state-level model; Models 3 and 4 would be based on the 1990 census, updated and not updated, respectively, for changes from 1990 to 1994 in the total school-age population of each county.
OCR for page 32
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations years versus 10) and the particular patterns of change during the two time periods affect the comparison unpredictably. The 1989 comparison represents only one replication. A similar comparison in which the 1980 census would provide the measure of "truth" was not possible because the administrative data required to estimate a county-level model of the number of school-age children in 1980 who were poor in 1979 were not all available. However, a comparison could be developed in which the 1990 census provides the measure of "truth" and the model estimates that are compared with it include the results of the 1989 county-level model and the results of estimating the county-level model for other years in the 1980s (say, 1985 or 1986). Such comparisons could provide insight into the effects of differing time lags on the quality of the model predictions. The Census Bureau has begun work to compare the 1990 census estimates with the results from the Census Bureau's 1989 county-level model and the results from alternative models for 1989 (e.g., a county-level model that estimates rates). Such work, which needs to be completed, can help determine if there is a specification for the county-level model that performs appreciably better than the selected specification. SYSTEMATIC UNDER-OR OVERESTIMATION FOR GROUPS OF COUNTIES Another important assessment is whether the model-based estimates tend to systematically over-or underestimate the number of school-age children in poverty for groups of counties with particular characteristics, a distinct question from the differences between model and truth for individual counties that are considered in the preceding section. Systematic error (bias) is important for two reasons. First, it harms or helps certain types of counties and, therefore, certain groups of people on the basis of their characteristics; in contrast, predictions that are reasonably free of systematic error avoid harming or helping identifiable groups. There is still random error, but its effect on any particular area is unpredictable and, to some extent, may average out over time. Second, identifying systematic error indicates how to improve a model (e.g., by including the specific characteristics in the model as predictors) and, thus, how to obtain more accurate estimates. For groups of counties, we compared the model estimates to census estimates and CPS direct estimates for 1989, as well as the model estimates to CPS direct estimates for 1993, averaging each type of estimate across member counties in each group. (Because we averaged over large enough groups of counties, the CPS sampling error is not excessively large.) Statistical hypothesis tests suggest that there is a tendency for the model to underpredict for counties with smaller populations, although the pattern is not completely consistent for the years (1989 and 1993) we examined. Tests involving a few other groupings of counties (e.g., by metropolitan/nonmetropolitan status) showed no consistent trends.
OCR for page 33
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations The panel is concerned about the effect of a possible bias by county size because the Census Bureau's estimates of school-age children in poverty in 1993 show a shift toward larger counties, compared with the distribution in 1989 (as measured by the 1990 census).9 This shift reflects some combination of actual changes between 1989 and 1993, differences between the CPS and decennial census measurements of poverty, and effects due to the properties of the model. The panel has so far only conducted a preliminary investigation to separate these three components, which has not enabled us to draw any firm conclusions about their relative importance; this issue merits further investigation. EFFECTS OF DIFFERENCES BETWEEN CENSUS AND CPS POVERTY MEASURES We are concerned that there may be subtle but important differences between census and CPS poverty measures, and, hence, we are concerned about switching from one measure to the other for Title I allocations before these differences have been more thoroughly studied. As noted above, the county-level model estimates poverty as measured by the March CPS, which represents a somewhat different measurement from that of the decennial census (see Appendix B). Qualitatively, it is plausible to expect that there would be differences between the CPS and census measures of poverty due to the many differences in data collection; quantitatively, however, it is not easy to show what systematic differences exist between the two measures. There are statistically significant differences between the March CPS (3-year average) and 1990 census estimates of the number of school-age children in poverty in 1989 for the nation as a whole and for groups of counties characterized by metropolitan status, region, and population size. For most categories, the CPS estimates of the number of poor school-age children are larger than the census estimates (see Table 2-4). Also, the pattern of differences by size of county suggests that the ratios of the CPS estimates to the census estimates may exhibit systematic differences; specifically, that these ratios may increase as a function of county size. Yet preliminary tests by the panel have not been able to establish statistically significant differences between the CPS-census ratios among groups of counties defined by population size and other characteristics. (As an example, there does not appear to be a statistically significant difference between the CPS-census ratio for counties with population size 100,000–499,999 population and the ratios for counties with smaller or larger population sizes.) However, we regard these analyses as very preliminary. It is important to note that the evaluations described above of the 1989 estimates of school-age children in poverty use the 1990 census as the "gold stan- 9 These comparisons were performed with the county estimates that were provided to the panel on January 7, 1997.
OCR for page 34
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations dard." Therefore, they provide valid evidence in favor of the CPS-based model even if one takes the census as the standard for measurement of poverty. Presumably, if the CPS (which is currently the basis for official poverty estimates) were treated as the standard, the CPS-based model would be evaluated still more favorably. USE OF POSTCENSAL POPULATION ESTIMATES The process for estimating school-age children in poverty at the county level and the Title I allocation formulas for using those estimates require population totals by age in noncensus years for two purposes: as a variable in the county-level model regression equation (population under age 21) and as the denominator (population aged 5–17) of the poverty rate in the allocation formula.10 Population totals by age are also required for the state-level model. The population totals by age must be estimated and, as estimates, are subject to errors. (See Appendix D for a description of the Census Bureau's population estimates program, which uses demographic analyses to update the previous census.) The variability of the estimated poverty rates of school-age children that is attributable to error in the denominator is not modeled explicitly by the Census Bureau. Analyzing this variability is difficult and has not yet been done. The Census Bureau does have an active program to develop and review the performance of its population estimates, including evaluating the estimates at 10-year intervals by comparing them with decennial census figures. These comparisons provide an indication of the errors, but they are not complete measures of accuracy and precision because the standard (i.e., the decennial census) itself is flawed, notably from net population undercount, which varies by age group across time and place (see Robinson et al., 1993). The Census Bureau's methods for producing postcensal population estimates have generally improved over time, but three features continue to apply to the county and state estimates. First, the relative errors are larger on average for small areas than for large ones. (This relation is practically inevitable.) Second, the relative errors tend to be larger for areas in which the population is changing rapidly than for areas that are more stable. Third, the relative errors for age groups tend to be higher than those for the total population. For county estimates of the total population, the average absolute percentage error improved between 1980 and 1990: in 1980, it was 4.1 percent unweighted and 3.1 percent weighted (by size of county); in 1990, it was 3.6 percent un- 10 More precisely, the denominator of the poverty rate in the allocation formula is the estimated population of related children aged 5–17 in each county. These estimates are developed by adjusting the estimates from the Census Bureau's population estimates program for the noninstitutionalized population aged 5–17 (see Appendix D).
OCR for page 35
Small-Area Estimates of School-Age Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations weighted and 2.3 percent weighted. Population size markedly affects the accuracy of the estimates. For counties with populations of more than 100,000 in 1980, the average absolute error in the 1990 estimate was 2.0 percent, but it was 4.6 percent for counties with populations of 2,500–5,000 and 7.7 percent for the smallest counties, those with less than 2,500 population. Growth rate is another important factor in the errors. The average absolute percentage error for the fastest growing counties (25% growth between 1980 and 1990) was 4.9 percent in 1990. For counties that grew very little (0–5%), the error was only 3.0 percent. The pattern is not monotonic: counties that lost population had a somewhat larger average absolute error, 3.5 percent (Davis, 1994). The Census Bureau has not yet completed its evaluation of county-level estimates for the population group aged 5–17 and the one under age 21. It is therefore difficult to assess the effect of the errors in these estimates on the updated estimates of school-age children in poverty or on the corresponding poverty rates. However, preliminary evaluations (Bureau of the Census, private communication) indicate that the average absolute error for the group aged 5–17 was 6.4 percent across all counties when estimates for 1990 that were derived from updating the 1980 census figures are compared with the 1990 census figures.
Representative terms from entire chapter: