4
Evaluations

The development of model-based estimates for small areas is a major research and development effort for which extensive evaluation is required. For updated estimates of poor school-age children for counties, a thorough assessment of all aspects of the estimation procedure is necessary so that policy makers can have confidence in using the estimates for allocating federal Title I education funds to counties. That assessment includes both an evaluation of a given model and comparisons with alternative models. Because there are no absolute criteria for what are acceptable evaluation results, a way to determine if the performance of a model can be improved is to examine alternative models. Such comparisons may indicate changes that would be helpful for a model; they may also suggest that an alternative model is preferable.

The Census Bureau's county estimates of poor school-age children are produced by using a county regression model, a state regression model, and county population estimates developed with demographic analysis techniques (see Chapter 2). A comprehensive evaluation for each of these components of the estimation procedure should include ''internal'' and "external" evaluations.

An internal evaluation is primarily an investigation of the validity of the underlying assumptions and features of a model. For a regression model, an internal validation is typically based on an examination of the residuals from the regression—the differences between the predicted and reported values of the dependent variable for each observation. In an external evaluation, the estimates from a model are compared with target or "true" values that were not used to develop the model. Ideally, internal evaluation of regression model output should precede external evaluation. If the assumptions required by a regression model



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations 4 Evaluations The development of model-based estimates for small areas is a major research and development effort for which extensive evaluation is required. For updated estimates of poor school-age children for counties, a thorough assessment of all aspects of the estimation procedure is necessary so that policy makers can have confidence in using the estimates for allocating federal Title I education funds to counties. That assessment includes both an evaluation of a given model and comparisons with alternative models. Because there are no absolute criteria for what are acceptable evaluation results, a way to determine if the performance of a model can be improved is to examine alternative models. Such comparisons may indicate changes that would be helpful for a model; they may also suggest that an alternative model is preferable. The Census Bureau's county estimates of poor school-age children are produced by using a county regression model, a state regression model, and county population estimates developed with demographic analysis techniques (see Chapter 2). A comprehensive evaluation for each of these components of the estimation procedure should include ''internal'' and "external" evaluations. An internal evaluation is primarily an investigation of the validity of the underlying assumptions and features of a model. For a regression model, an internal validation is typically based on an examination of the residuals from the regression—the differences between the predicted and reported values of the dependent variable for each observation. In an external evaluation, the estimates from a model are compared with target or "true" values that were not used to develop the model. Ideally, internal evaluation of regression model output should precede external evaluation. If the assumptions required by a regression model

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations are not supported to a reasonable extent, then even a positive external evaluation would not justify the choice of the model. Changes made to a model to address concerns raised by an internal evaluation would likely improve its performance in an external evaluation. Both internal and external evaluations should be carded out for alternative models. When the original 1993 county estimates of poor school-age children were provided to the panel, the Census Bureau had not had time to complete a full evaluation of them. Subsequently, the panel developed a set of evaluation criteria, and the panel and the Census Bureau conducted a series of internal and external evaluations. The focus of the evaluation effort was on alternative county models, particularly the assumptions underlying the regression equations and how the estimates of poor school-age children in 1989 from each model compared with 1990 census estimates. The state model and the county population estimates were examined as well, both directly and as they contribute to the county estimates of poor school-age children. The evaluations, which are described in this chapter, include: internal evaluation of the regression output for alternative county models estimated for 1993 and 1989; comparison of estimates of poor school-age children for 1989 from alternative county models with 1990 census estimates, a form of external evaluation; consideration of differences between the CPS and census measurement of income and poverty as a factor that could explain differences between model-based estimates and census estimates for 1989; examination of the original 1993 county estimates to identify possibly anomalous estimates that were then reviewed with knowledgeable local people, another form of external evaluation; evaluation of the state model, including examination of regression output, external evaluation in comparison with 1990 census estimates, and consideration of the state raking factors by which county model estimates are adjusted to make them consistent with the state model estimates; and evaluation of county population estimates for children aged 5-17 (see also Appendix B). The internal evaluation of regression output and the comparison of model-based estimates of poor school-age children for 1989 with 1990 census estimates—evaluations (1) and (2) above—were carded out for the four single-equation county models that were considered serious candidates to produce revised 1993 county estimates of poor school-age children (see Chapter 3 and Appendices C and D): log number model (under 21), the original model that the Census Bu-

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations reau used to produce the original 1993 county estimates of poor school-age children; log number model (under 18), the revised model that the Census Bureau used to produce the revised 1993 county estimates of poor school-age children; log rate model (under 21); and log rate model (under 18). In addition, the 1990 census comparisons (2) were performed for some other estimation procedures that rely much more heavily than do the four candidate models on estimates from the 1980 census (see below, "Comparisons with 1990 Census County Estimates"). The internal evaluation of regression output (1) and the comparison of estimates of poor school-age children for 1989 with 1990 census estimates (2) examined residuals and model differences from the census, respectively, for categories of counties. The following characteristics were used for categorizing counties: census division; metropolitan status of county; population size in 1990; population growth from 1980 to 1990; percent poor school-age children in 1980; percent Hispanic population in 1990; percent black population in 1990; for rural counties, persistent poverty from 1960 to 1990; for rural counties, economic type; percent group quarters residents in 1990; number of households in the CPS sample (or whether the county had sampled households); and (for 1990 census comparisons only) percent change in the proportion of poor school-age children from 1980 to 1990 (see details in Table 4-3, below). INTERNAL EVALUATION: COUNTY MODEL REGRESSION OUTPUT The first test of a regression model is that it perform well when evaluated internally, that is, for the set of observations for which it is estimated. The panel and the Census Bureau examined the underlying assumptions of the four candidate models through evaluation of the regression model output for 1989 and 1993.1 Although such an evaluation is not likely to provide conclusive evidence with which to rank the performance of alternative models, particularly when they use different transformations of the dependent variable, examination of the regression output is helpful to determine which models perform reasonably well. 1   The evaluation of the county regression output pertains to the regression models themselves, that is, before the predictions are combined with the direct CPS estimates in a "shrinkage" procedure or raked to the estimates from the state model (see Chapter 2). For these models, the regression output comprises the model predictions for counties with at least one household with poor school-age children in the CPS sample. For the two log number models, the predictions are the log number of poor school-age children; for the two log rate models, the predictions are the log proportion of poor school-age children.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations The assumptions investigated fall into two groups: assumptions concerning the functional form of the regression model and assumptions concerning the error distribution. Because properties of the error distribution affect the ability to fit a model, studies of these two types of assumptions are not entirely separable.2 The assumptions examined in the first group are linearity of the relationship between the dependent variable and the predictor variables; constancy of the assumed linear relationship over different time periods; and whether any of the included predictor variables are not needed in the model and, conversely, whether other potential predictor variables are needed in the model. The assumptions examined in the second group are normality (primarily symmetry and moderate tail length) of the distribution of the standardized residuals;3 whether the standardized residuals have homogeneous variances, that is, whether the variability of the standardized residuals is constant across counties and does not depend on the values of the predictor variables; and absence of outliers. Each assumption is discussed in terms of the methods used for evaluation and the results of the evaluation for the four candidate models. Linearity Linearity of the relationships between the dependent variable and the predictor variables was assessed graphically, by observing whether there was evidence of curvature in the plots of standardized residuals against the predictor variables in the model. In addition, plots of standardized residuals against CPS sample size and against the predicted values from the regression model were also examined for curvature. The only evidence of nonlinearity is for the log number (under 21) model (a) for 1989. For that year, the standardized residuals appear to have a very modest curvature when plotted against the predicted values. Constancy over Time Constancy over time of the assumed linear relationship of the dependent and predictor variables was assessed through comparison of the regression coefficients on the predictor variables for 1989 and 1993. While major changes in 2   These assumptions were also examined for the analogous 1990 census regressions. However, since the census equations only affected the weights for the weighted least squares regression and the extent of "shrinkage" in combining model estimates and direct estimates for counties with households in the CPS sample, analyses of the 1990 census regressions are not discussed here. 3   The standardization of the residuals involved estimating the predicted standard errors of the residuals, given the predictor variables, and dividing the observed residuals by the predicted standard errors. The predicted standard error of the residual for a county is a function of the estimated model error variance and the estimated sampling error variance (see Belsley, Kuh, and Welsch, 1980).

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations economic conditions are expected to cause some changes in the coefficients, a relatively stable regression equation would be desirable.4 Table 4-1 shows the regression coefficients for the predictor variables for the four candidate models for 1989 and 1993. In the log number models (a, b) for 1989 and 1993, the coefficients for the three "poverty level" predictor variables—child exemptions reported by families in poverty on tax returns (column 1), food stamp recipients (column 2), and poor school-age children from the previous census (column 5)—are similar. There are substantial differences across the two time periods in the estimated coefficients for the other two variables—population (under age 21 or under age 18, column 3) and total number of child exemptions on tax returns (column 4). However, the sum of the these two coefficients is generally close to 0 in each model in each year. Because these two variables are highly positively correlated, the predictions from equations with a similar sum for the two coefficients will be similar. The sum of all coefficients in each equation for models (a) and (b) ranges from 1.04 to 1.07 and is significantly greater than 1. A sum equal to 1 would mean that county population size itself has no effect on the estimated number of poor school-age children. Because the sum is greater than 1, the estimated number of poor school-age children is a larger percentage of the population in the larger counties. While this result is difficult to explain as a function of county size, it may be that size reflects the effects of variables not included in the models. In the log rate models (c, d), the coefficients for the three "poverty rate" predictor variables—ratio of child exemptions reported by families in poverty on tax returns to total child exemptions (column 1), ratio of food stamp recipients to the total population (column 2), and ratio of poor school-age children from the previous census (column 4)—are all positive and about the same size.5 The coefficients for the ratio of total child tax exemptions to the population (under age 21 or under age 18, column 3) are negative, and there are substantial differences across the two time periods in the estimated coefficients. The sign of the related variable (total number of child tax exemptions) is generally negative in the log number equations. As in the log number equations, the coefficients in the log rate equations for population under 21 differ from the coefficients for population under 18. 4   Because the county model is refit for each prediction year, constancy over time is not as important as it would be if the estimated regression coefficients from the model for one year were used for predictions for subsequent years. Nonetheless, it is disturbing for the regression coefficients to exhibit large, unexplained changes over time. 5   The coefficients are also similar to the coefficients for the corresponding variables—number of child exemptions reported by families in poverty on tax returns, number of food stamp recipients and number of poor school-age children from the previous census—in the log number equations.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations TABLE 4-1 Estimates of Regression Coefficients for Four Candidate County Models for 1989 and 1993     Predictor Variablesa Model Counties (Number) 1 2 3 4 5 (a) Log Number (under 21) 1989 1,028 0.52 (.07) 0.30 (.05) 0.76 (.22) -0.81 (.22) 0.27 (.07) 1993 1,184 0.31 (.08) 0.30 (.07) 0.03 (.21) 0.03 (.21) 0.40 (.09) (b) Log Number (under 18) 1989 1,028 0.50 (.06) 0.23 (.05) 1.79 (.27) -1.80 (.27) 0.32 (.07) 1993 1,184 0.38 (.08) 0.27 (.07) 0.65 (.24) -0.59 (.24) 0.34 (.09)     Predictor Variablesb (c) Log Rate (under 21) 1989 1,028 0.32 (.07) 0.29 (.04) -0.73 (.19) 0.40 (.07)   1993 1,184 0.23 (.08) 0.31 (.06) -0.07 (.18) 0.41 (.09)   (d) Log Rate (under 18) 1989 1,028 0.29 (.07) 0.26 (.04) -1.13 (.24) 0.43 (.07)   1993 1,184 0.26 (.08) 0.30 (.06) -0.42 (.20) 0.38 (.09)   NOTES: All predictor variables are on the logarithmic scale for numbers and rates. Standard errors of the estimated regression coefficients are in parentheses. The four models were estimated for each year with maximum likelihood. The original 1994 population estimates were used for the 1993 models; 1990 census population estimates were used for the 1989 models. a predictor variables: (1) number of child exemptions reported by families in poverty on tax returns; (2) number of people receiving food stamps; (3) population (under age 21 or under age 18); (4) total number of child exemptions on tax returns; (5) number of poor school-age children from previous (1980 or 1990) census. b Predictor variables: (1) ratio of child exemptions reported by families in poverty on tax returns to total child exemptions; (2) ratio of people receiving food stamps to total population; (3) ratio of total child exemptions on tax returns to population (under age 21 or under age 18); (4) ratio of poor school-age children from previous (1980 or 1990) census.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations Inclusion or Exclusion of Predictor Variables The possibility that one or more predictor variables should be excluded from a model was assessed by looking for insignificant t-statistics for the estimated values of individual regression coefficients.6 The need to include a predictor variable, or possibly to model some categories of counties separately, was assessed by looking for nonrandom patterns, indicative of possible model bias, in the distributions of standardized residuals displayed for the various categories of counties.7 The only predictor variables with nonsignificant t-statistics are the population under age 21 (column 3 in Table 4-1) and total child exemptions on IRS income tax returns (column 4) for the log number (under 21) model (a) in 1993, and the ratio of child tax exemptions to the population under age 21 (column 3) for the log rate (under 21) model (c) in 1993. All other regression coefficients are significantly different from 0 at the 5 percent level. Application of Akaike's information criterion (AIC) confirmed the superiority of using the population under age 18 as a predictor variable in preference to the population under age 21 in the log number model. (The test was not performed for the log rate model.) For most ways of categorizing counties, the standardized residuals do not exhibit systematic patterns. The exceptions are that all four models in 1989 tend to overpredict poor school-age children in counties with a high percentage of Hispanic residents (i.e., the model estimates are somewhat higher than the CPS direct estimates for these counties relative to other counties) and that the log number (under 21 and under 18) models (a, b) tend to overpredict poor school-age children in counties that are in metropolitan areas but are not the central county in the area. Normality The normality of the standardized residuals was evaluated through use of Q-Q plots, which match the observed distribution of the residuals with the theoretical distribution, and other displays of the distribution. All four models exhibit some skewness in their standardized residuals, with the log rate models (c, d) showing somewhat more skewness than the log number models (a, b). For none of the models does the skewness appear sufficiently marked to be a problem. 6   Although the performance of a predictive regression model is best assessed in terms of the joint impact of the predictor variables, examining the individual predictor variables can suggest ways in which a model might be improved. 7   The distributional displays examined for this and other model assumptions were box plots.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations Homogeneous Variances The homogeneity of the variance of the standardized residuals was assessed using a variety of statistics and graphical displays (see Appendix C). Examination of them clearly demonstrates some variability in the size of the absolute standardized residuals as a function of the predicted value (number or proportion of poor school-age children) and the CPS sample size for all four models. With regard to CPS sample size, one would expect the standardized residual variance to remain constant over the distribution of CPS sample size; however, it increases with increasing CPS sample size. The heterogeneity of the variance of the residuals should be investigated because it suggests that there may be a problem with the model specification or in the assumptions that were used to calculate the standardized residuals. However, adjusting a model to remove this type of heterogeneity is likely to have only a small effect on the estimated regression coefficients or the model estimates. The effect on estimates of poor school-age children would stem from: a shift in the weights assigned to each county in fitting the regression model, which would very likely result in only a modest change in the estimated regression coefficients; and a change in the weight given to the direct estimates, which could have an appreciable effect only on the estimates for counties with large CPS sample sizes. Outliers The existence of outliers was evaluated through examination of plots of the distributions of the standardized residuals and plots of standardized residuals against the predictor variables and through analysis of patterns in the distribution of the 30 largest absolute standardized residuals for the various categories of counties. However, it is difficult to evaluate the evidence for outliers that results from a least squares model fit, which has the property that it may miss influential outliers. In addition, since the four models arc so similar and make use of the identical data, it is unlikely that an observation that was a marked outlier for one model would not also be a marked outlier for the other models. An examination of the distributions of the standardized residuals indicates that none of the four models is especially affected by outliers, although the 1993 estimates have more outliers than the 1989 estimates, and nonrural counties and metropolitan counties that are not central counties have somewhat more outliers than other categories of counties. This analysis is only a start. It would be useful, using other statistics and various graphical techniques, to identify the counties that are not well fit by robustly estimated versions of these models in order to determine any characteristics that outlier counties have in common.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations Summary The panel concludes that the analysis of the regression output for the four candidate county models for 1989 and 1993 largely supports the assumptions of the models: there is little evidence of important problems with the assumptions. The analysis does not strongly support one model over another, although it does support use of the population under age 18 instead of the population under age 21 as a predictor variable in the log number model. All of the models exhibit a few common problems. First, they all behave somewhat differently for larger urban counties and counties with large percentages of Hispanic residents than for other counties. The differences are not pronounced, but research should be conducted to determine possible ways to modify the models to eliminate or reduce this problem. Second, all models show evidence of some variance heterogeneity with respect to both CPS sample size and poverty rate. This problem can likely be eliminated or reduced by research currently ongoing at the Census Bureau to develop direct estimates of the county-level sampling variances (see Chapter 6). EXTERNAL EVALUATION: COMPARISONS WITH 1990 CENSUS COUNTY ESTIMATES For external evaluation, the panel and the Census Bureau compared the estimated number and proportion of poor school-age children for 1989 for the four candidate models with 1990 census estimates.8 The evaluation examined the overall difference between the estimates from a model and the census and the differences for groups of counties categorized by various characteristics. Evaluation by comparison with the 1990 census is not ideal because the census estimates are not true values. They are affected by sampling variability and population undercount; also, the census measurement of poverty differs from the CPS measurement in ways that are not fully understood (see National Research Council, 1997:Ch. 2, App. B; see also the Census Bureau's web site: http://www.census.gov/hhes/www/saipe93/inputs/cencpsdf.html). In addition, there is only one census-based validation opportunity: because of the lack of IRS and food stamp program data for counties for 1979, it is not possible to evaluate model-based estimates by comparison to the 1980 census. Reliance on a single 8   The county estimates reflect the effects of the state model and the county population estimates as well as the county regression model, but the differences in model performance vis-à-vis the census in the evaluation are due to the particular form of the county model. The models for which the 1990 census comparisons were performed were estimated with the method of moments. Maximum likelihood was used to estimate the log number (under 18) model (b) for the revised 1993 county estimates of poor school-age children. The differences in the estimates from the two techniques are small.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations validation using the 1990 census is a problem because a model may perform better or worse in any one validation than it would on average over multiple validations. For this reason, if it were possible to compare model estimates with census or other estimates for 1993 instead of 1989, the results might turn out differently. Nonetheless, in the absence of other means of external validation, the panel and the Census Bureau relied heavily on the 1990 census comparisons to understand the performance of alternative models. Evaluation by comparison with the 1990 census is intended to assess the accuracy of model estimates for the prediction year (i.e., 1989). The evaluation does not address the issue that model-based estimates are likely to be used for Title I allocations several years later. It would be useful to conduct research to reduce the time lag between the prediction year for model-based estimates and the year for Title I allocations to the extent possible (see Chapter 6). The 1990 census estimates that are used in the comparisons are ratio adjusted by a constant factor to make the census national estimate of poor school-age children equal the 1989 CPS national estimate. This adjustment removes the difference of about 5 percent between the CPS and census estimates of total poor school-age children for 1989. Consequently, the differences between a model and the 1990 census in estimating poor school-age children for groups of counties can be interpreted as differences in shares. This feature is useful because the Title I allocation formula distributes funding as shares (percentages) of a fixed total dollar amount. In addition to the four candidate models, the 1990 census comparisons were performed for four estimation procedures that rely much more heavily on 1980 census estimates. Given the substantial changes in the number and proportion of poor school-age children between the 1980 and 1990 censuses (see National Research Council, 1997:8-9), one would expect these procedures to perform less well than the candidate models in predicting poverty for school-age children in 1989.9 In a period of less pronounced change, one or more of them might perform relatively well. The census comparisons were done for the following procedures: Stable shares procedure, in which the county estimates of poor school-age children for 1989 are the 1980 census estimates for 1979 after ratio adjustment to make the 1980 census national estimate equal the CPS national estimate for 1989. This simple procedure assumes no change over the decade in each county's share of the total number of poor school-age children nationwide: this is the same assumption that underlies previous practice for Title I allocations, in 9   Although the interval was only 4 years instead of 10, substantial changes in the number and proportion of poor school-age children also occurred between 1989 and 1993 (see National Research Council, 1997:10-13).

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations which estimates from the decennial census were used in the formulas each year until the results from the next census became available. 10 Stable shares within state procedure, in which the county estimates of poor school-age children for 1989 are the 1980 census estimates for 1979 after raking the estimates for the counties in each state to the estimates from the Census Bureau's state model for 1989. (The national raking employed in the state model also adjusts the total to equal the CPS national estimate for 1989.) This procedure assumes no change over the decade in each county's share of the total number of poor school-age children in its state. Stable rates within state procedure (with conversion), in which the county estimates of poor school-age children for 1989 are developed by converting 1980 census estimates of the proportions of poor school-age children for 1979 to estimated numbers by use of 1990 county population estimates of total school-age children 5-17 and then raking the estimated numbers to the Census Bureau's state model estimates for 1989. Averaging procedure, in which the county estimates of poor school-age children for 1989 are developed from an average of estimates from the 1980 census and the log number (under 21) model (a) for 1989. 11 The rest of this section first discusses overall absolute differences from the 1990 census estimates for the four candidate models and the four procedures that rely more heavily on the 1980 census. It then discusses differences for categories of counties for the four candidate models and two of the procedures: the stable shares procedure and the averaging procedure. Differences for categories of counties for the other two procedures, which are intermediate in their reliance on 1980 census estimates, are provided in Appendix D. 10   However, the estimates from the 1990 census that were previously used for Title I allocations were not adjusted to the current CPS national estimate of poor school-age children, which could affect the allocations for some counties. For example, some counties might meet the threshold test for a concentration grant if the census estimates were adjusted to the current CPS national estimate but not if the estimates were unadjusted. 11   More precisely, the estimates are developed by averaging the proportions of poor school-age children from the 1980 census and the log number (under 21) model (a) for 1989, converting the estimates to numbers by use of 1990 county population estimates of total school-age children, and making an overall ratio adjustment to the CPS national estimate for 1989. This procedure is analogous to the panel's recommendation for averaging 1990 census and 1993 model-based estimates for use in Title I allocations for the 1997-1998 school year. However, the panel's recommendation did not include raking the average estimates to the CPS national estimate of poor school-age children in 1993 (see National Research Council, 1997:38).

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations Research Council, 1997:App. B). The Census Bureau performed chi-square tests to determine if there were significant differences between estimates from the March 1990 CPS and the 1990 census of the number of school-age children and the number and proportion poor in this age group in 1989 for county groupings (Fay, 1997).17 More specifically, the tests determined if the ratios of the CPS and census estimates for categories of a characteristic, such as county population size, were significantly different from each other. The characteristics tested were those examined in the 1990 census comparisons. The tests generally show inconclusive results. However, there is some evidence that, when compared with the 1990 census, the March 1990 CPS estimates higher numbers and proportions of poor school-age children in metropolitan counties and larger size counties relative to medium-size counties. (CPS estimates for small-size counties have low reliability because of the relatively small proportion of the population in such counties and the small number of these counties in the CPS sample.) Also, while not significant, a pattern is evident in which the March CPS, when compared with the 1990 census, tends to estimate higher numbers and proportions of poor school-age children in counties with higher percentages of Hispanic population. These results for population size and percent Hispanic population parallel the results from the 1990 census comparisons described above. They suggest that at least some portions of the category differences for the candidate models for these two characteristics arise from differences in the CPS measurement of poverty and are not due to model error as such. Whether similar CPS-census differences would be present for 1993 is, of course, not known. EXTERNAL EVALUATION: LOCAL ASSESSMENT OF 1993 COUNTY ESTIMATES The panel performed another type of external evaluation of the original 1993 county estimates of poor school-age children—the use of local knowledge.18 Using the original 1993 model estimates for all 3,143 counties in the United States, the analysis first sought to identify groups of counties for which the 1993 estimates seemed unusually high or low in relation to prior levels and trends (e.g., from 1980 to 1990) in the number and proportion of poor school-age children and known social and economic trends for these groups of counties. Then, local informants—including staff and members of local councils of government, eco- 17   The March 1990 CPS estimates for the categories involved are direct estimates produced using the CPS weights. 18   This evaluation was carried out at the University of Wisconsin-Madison by Dr. Paul Voss, a member of the panel, with the assistance of Richard Gibson and Kathleen Morgen (see Voss, Gibson, and Morgen, 1997).

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations nomic development authorities, welfare agencies, state demographic units, state data centers, and other agencies—were contacted to obtain their assessment of the reasonableness of the implied trends in poverty for school-age children given their knowledge of local socioeconomic conditions19 County Analysis Changes in the number and proportion of poor school-age children implied by the 1993 estimates were examined for counties categorized by several characteristics, including: population size and metropolitan status; population change; percent immigrant population; college-dominated counties; reservation and Native American counties; for nonmetropolitan counties, whether predominantly agricultural; and several classifications by geographic location (e.g., state and the regions identified by the U.S. Department of Agriculture). The analysis identified a number of categories of counties for which further investigation of the reasonableness of the 1993 estimates seemed warranted: Large metropolitan central city counties had a high implied percentage change in the number of school-age children in poverty between 1989 and 1993-42 percent. This change declined systematically with decreasing size for metropolitan counties and continued to decline to the most remote, rural nonmetropolitan counties, for which the implied change in the number of school-age children in poverty was-6 percent. Counties with higher levels of international immigration had higher implied increases in the number and proportion of poor school-age children. Counties with higher percentages of Native Americans had lower implied increases in the number and proportion of poor school-age children. There was no particular pattern for counties with reservations. Farm counties had an implied decline in the number and proportion of poor school-age children, while nonfarm metropolitan counties had an implied increase. When the country was divided into the 26 regions identified by the U.S. Department of Agriculture, several regions were identified on the extremes of change in the number and proportion of poor school-age children. High implied increases were found in the Northern Metropolitan Belt, the Florida Peninsula, the Southwest, Northern New England, Mohawk New York and Pennsylvania, Lower Great Lakes Industrial, Southern Piedmont, and the Northern Pacific Coast. Small implied increases were found in the Central Corn Belt, the Southern Appalachian Coal Region, the Coastal Plain Cotton Region, the Northern Great Plains, 19   The discussion refers to "implied" trends because the Census Bureau's county model is not designed to directly estimate change over time.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations and the Rockies, Mormon, Columbia River Region. The single region with an implied decrease in the number and proportion of poor school-age children was the Mississippi Delta. Some of these implied changes are apparently related to the general effect of population size, discussed above. However, the findings in this regional analysis, in particular, suggested which states and counties to follow up in discussions with local officials. Local Input When counties that share certain characteristics appeared also to share a common pattern of change in the number and proportion of poor school-age children, a variety of individuals with local knowledge were contacted. Initially, 70 individuals associated with state data centers or state data center affiliate units were contacted; they provided a series of responses and referrals to other state and local officials. In addition, 26 states that appeared to have a sizable number of counties that shared a common implied trend in poverty for school-age children were targeted for intensive contact. The nature of responses varied considerably. In some states, the original 1993 county estimates released by the Census Bureau had not been examined, and there appeared to be little interest in discussing them. In other states, the estimates had been looked at, but the general admonitions about standard errors that accompanied their release had dampened interest in studying them in detail. In contrast, several states had carded out in-depth analyses of the estimates. Of the 26 states targeted for intensive follow up, 8 provided detailed explanations (supported by examples) of trends suggested by the original 1993 county estimates, and 7 more states provided in-depth responses supported by their own analyses. Almost every state agency contacted expressed specific doubts about the original 1993 estimates for one or more counties—too high here, too low there. In general, however, there was no consensus that the trends implied by the original 1993 county estimates were wrong, even in states for which large numbers of counties experienced apparent declines in the number and proportion of poor school-age children. Of the 26 states, 21 provided explanations as to why the original 1993 estimates appeared to show poverty trends in a specific direction or why the direction of change is too difficult to know. The most common explanations included comments about the size of the county, its rural agricultural nature, the fact that it is a diverse metropolitan county, immigration from abroad, and economic growth or economic decline. Occasionally, reference was made to a military base, an Indian reservation, or a university as an explanation for an apparent trend in poverty for school-age children. In three states, concern was

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations expressed about the role of food stamp program data in the estimation model, as these data were deemed to be unreliable. In summary, a high level of concern was expressed by individuals with local knowledge about the statistical reliability of the original 1993 county estimates, which is largely due to the Census Bureau's own cautions in this regard, coupled with specific county estimates that seem on the basis of local knowledge to be highly doubtful. These concerns notwithstanding, no categories of counties were identified that experienced apparent trends in the number and proportion of poor school-age children between 1989 and 1993 that were not accepted by local informants. Although the trends for a few counties were not accepted locally, the analysis found no strong indicators of potential bias for groups of counties sharing common characteristics in the county model. STATE MODEL The state model plays an important role in the production of county estimates of poor school-age children. Evaluations conducted of the state model include an internal evaluation of the regression output for 1989 and 1993 and an external evaluation through comparing 1989 estimates from the model with 1990 census estimates of the proportion of poor school-age children by state. The results in each case, which are summarized below, support the use of the model. However, the state model evaluations have been more limited than the county model evaluations, as alternative state model formulations have not been evaluated explicitly. Further evaluation of the state model would be useful (see Chapter 6), particularly to examine the relationship between the state and county models and what factors may underlie the variations in the state estimates from the state model and the state estimates formed by summing the estimates from the county models (see below, "State Raking Factors"). State Model Regression Output The state regression model is a poverty rate model with the variables not transformed (see Chapter 2). The analysis of the regression output for the state model for 1989 and 1993 examined the same assumptions that were examined for the four candidate county models. The analysis is somewhat less informative for the state model than for the county models for two reasons. First, few explicit alternatives were developed for the state regression model. A log rate model was developed, and comparisons of that model with the rate model demonstrated that the log rate formulation had no particular advantage. Work was also started on a multivariate state model.20 However, no formal analysis of regression output 20   A multivariate formulation could be advantageous not only for the state model, but also for the county model, as an extension of the bivariate formulation for which initial development work was carried out (see Chapter 6).

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations was performed for either of these alternatives or for other alternatives that were explored early on in the development of the model. Second, there are more than 3,000 counties but only 50 states, and states vary much less than counties with respect to poverty rates and other characteristics. Hence, comparisons for categories of states are less informative than comparisons for categories of counties, and some categories of states do not contain enough states for analysis. Nonetheless, examination of the regression output for the state model helps assess the validity of its assumptions. Overall, the analysis finds strong support for the assumptions underlying the state model (see below); there is no evidence of significant problems with the model formulation (although there may be other models that fit just as well). Linearity Plots of standardized residuals against the four predictor variables in the state model—the proportion of child exemptions reported by families in poverty on tax returns, the proportion of people receiving food stamps, the proportion of people under age 65 who did not file a tax return, and a residual from the analogous regression equation using the previous census as the dependent variable—support the assumption of linearity. Furthermore, the standardized residuals, when plotted against the model predicted values, provide no evidence of the need for any transformation of the variables. This result helps justify the decision not to use the log transformation of the proportion of poor school-age children as the dependent variable. Constancy over Time Table 4-5 shows the regression coefficients for the predictor variables for the state model for 1989 and 1993. The coefficients for all four poverty rate predictor variables are positive in both years. Generally, the coefficients are similar for 1989 and 1993, with the exception that the coefficient of the residual from the previous census (column 4) is large and significant for 1993 but fairly small and not significant for 1989. Inclusion or Exclusion of Predictor Variables The standardized residuals for the state regression model were grouped into four categories for each of the following characteristics: census region; 1990 population size; 1980 to 1990 population growth; percent black population in 1990; percent Hispanic population in 1990; percent group quarters residents in 1990; and percent poor school-age children in 1979 (from the 1980 census). The distributions of the standardized residuals for each category were then displayed using box plots. For none of these box plots was there an obvious pattern to the standardized residuals across categories. The model slightly overpredicts the proportion of poor school-age children for large states in 1993 (i.e., the model estimates are somewhat higher than the CPS direct estimates for large states relative to other categories), but this pattern is not evident in 1989. The model also slightly overpredicts the proportion of poor school-age children for states with a moderate percentage of Hispan-

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations TABLE 4-5 Estimates of Regression Coefficients for the State Model for 1989 and 1993   Predictor Variablesa Year 1 2 3 4 1989 0.53 (.10) 0.57 (.21) 0.33 (.10) 0.37> (.32) 1993 0.31 (.11) 0.98 (.22) 0.52 (.13) 1.36 (.39) NOTES: All predictor variables are in terms of rates. Standard errors of the estimated regression coefficients are in parentheses. a Predictor variables: (1) ratio of child exemptions reported by families in poverty on tax returns to total child exemptions; (2) ratio of people receiving food stamps to total population; (3) ratio of people under age 65 who did not file an income tax return to total population under age 65; (4) residual from a regression of poverty rates for school-age children from the prior decennial census (1980 or 1990) on the other three predictor variables. ics in 1989 and slightly underpredicts the proportion of poor school-age children for states in the West census region in 1993, but in neither case is the pattern observed for the other year. Therefore, there is no strong reason to suggest that these variables need to be incorporated in the state regression model. Normality, Homogeneous Variances, and Outliers The distribution of the standardized residuals from the state regression model appears to follow a normal distribution. Also, although there is less information available for the state model than for the county regression models, the residual plots and the box plots of the distributions of the standardized residuals against the categories of states show little evidence of any heterogenous variance. Finally, there is no evidence of outliers from examination of the residual plots or displays of the distributions of the standardized residuals from the state regression model. 1990 Census Comparisons Fay and Train (1997) compare 1989 estimates of the proportion of poor school-age children from the state model with 1990 census estimates. They find that the differences between the model and census estimates are much smaller than the differences between the 1989 CPS direct estimates and the 1990 census

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations estimates and considerably smaller than the differences between the 1980 census estimates and the 1990 census estimates. These findings, which are presented graphically in Fay and Train (1997), support the use of a model-based approach to producing updated state estimates of poor school-age children instead of relying on estimates from the previous census or from the CPS alone. Similarly, a formal hypothesis test performed for the state model (Fay, 1996) supports the conclusion that the model-based estimates for 1993 are preferable to estimates from the 1990 census.21 Comparable evaluations have not been performed for alternative state models or for categories of states. State Raking Factors The final stage in producing updated estimates of the number of poor school-age children for counties is to rake the estimates from the county model for consistency with the estimates from the state model. The raking procedure is clearly beneficial to the county estimates. Thus, the 1990 census comparisons for the two procedures (ii and iii) that raked the 1980 census estimates to the estimates from the state model for 1989 showed better performance than the stable shares procedure (i), which did not entail raking. Also, an evaluation that was performed of the original log number (under 21) model (a) found a smaller overall average absolute difference from the 1990 census when the county model estimates were raked to the state model estimates for 1989 than when the county model was used without raking (National Research Council, 1997:31). On the assumption that a county model is performing well, one would expect the state raking factors to be tightly distributed around 1.0—that is, one would expect relatively minor differences between the estimates for states formed by summing the county estimates before raking and the estimates from the state model. However, the raking factors vary considerably across states. For example, the log number (under 18) model (b), which shows somewhat less variation than the other three candidate models, has raking factors that range from 0.86 to 1.31 in 1989 (two-thirds falling between 0.96 and 1.17) and from 0.84 to 1.29 in 1993 (two-thirds falling between 0.98 and 1.15). This degree of variation suggests that there may be state effects not captured in the county model, which, in turn, could possibly affect the behavior of the model in estimating poor school-age children for counties within states. Also, the state model uses 1 year of CPS data, while the candidate models use 3 years: this difference could contribute to the variation in raking factors and also to the fact that they average greater than 1. Implementation of a fixed state effects formulation of the county model in which state indicator variables are included as predictor variables in the regres- 21   The test assumes that the objective is to predict poverty rates that reflect the CPS measurement of poverty and not the decennial census measurement.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations sion (see Chapter 3) widened rather than narrowed the range of the state raking factors. Technical reasons having to do with the transformation of the predicted log values of poor school-age children to estimated numbers probably explain the increased variation in the state raking factors with a fixed state effects model. However, further investigation of the state raking factors and how to account for state effects in the county model should be topics for research in the near term (see Chapter 6). The investigation should include consideration of whether there is any feature of the state model that might explain the variation in the raking factors. USE OF POSTCENSAL POPULATION ESTIMATES The process for producing updated estimates of school-age children in poverty at the county level and the use of those estimates in the Title I allocation formulas require population totals by age in noncensus years for two purposes: as a variable in the county regression equation (population under age 18 or under age 21, depending on the model), and as the basis for computing the estimated number or proportion of poor school-age children, depending on the model specification (log number or log rate). Population totals by age are also required for the state model. The Census Bureau's log number (under 18) model (b) produces estimates of the number of poor school-age children in each county. Because the Title I allocations require both numbers and proportions, the Census Bureau provides the Department of Education with population estimates for the 5-17 age group to use as denominators for calculating the proportion of poor school-age children.22 The Census Bureau currently develops county age estimates within the framework of total population estimates for counties and age estimates for states (see Appendix B). Briefly, in a process that begins anew with each decennial census, total population estimates for counties are developed by updating the population estimates for the preceding year with data on births, deaths, net immigration from abroad, and net internal migration. (Net internal migration is estimated from a year-to-year match of federal income tax returns for people under age 65 and from the change in Medicare enrollment records for people aged 65 and over.) Estimates are developed separately for the population over and under 22   The population estimates of school-age children that accompany the 1993 county model estimates pertain to July 1994. In addition, the Census Bureau makes available on its web site estimated proportions of poor school-age children in which the denominators are estimates of related children aged 5-17 in each county. These estimates are developed by adjusting the estimates from the Census Bureau's population estimates program for the noninstitutionalized population aged 5-17 on the basis of the ratio of related children aged 5-17 to noninstitutionalized children aged 5-17 for each county in the 1990 census.

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations age 65 in households and in group quarters. County total population estimates are aggregated to form state total population estimates. State estimates by single years of age are developed by similar demographic methods, in which the preceding year's estimate for each cohort (single year of age) is updated with data on births, deaths, and migration. (For people under age 65, net internal migration is estimated from school enrollment data.) The state estimates for single ages are then raked to equal the state population totals. Finally, county estimates by age are developed by ratio-adjusting the 1990 census county age estimates to both the updated county total population estimates and the updated state age estimates in an iterative proportional fitting (raking) procedure. This procedure assumes that the age distribution of each county within a state changes in the same manner as that state's age distribution. The Census Bureau has an active program to develop and review the performance of its demographically based population estimates, including evaluating the estimates at 10-year intervals by comparing them with the decennial census as a measure of the true values. These comparisons provide an indication of the differences, but they are not complete measures of accuracy and precision because the standard (i.e., the decennial census) itself is flawed, notably from net population undercount, which varies by age group across time and place (see Robinson et al., 1993). The Census Bureau's methods and data for producing postcensal population estimates have generally improved over time, but three patterns of differences, which are practically inevitable, continue to affect the county and state estimates (see Davis, 1994). First, the proportional differences of the estimates in comparison with the census are larger on average for small areas than for large ones. Second, the proportional differences tend to be larger for areas in which the population is changing rapidly than for areas that are more stable. Third, the proportional differences for age groups tend to be higher than those for the total population. The Census Bureau recently completed an evaluation of the county estimates of total population and children aged 5-17 by comparison with the 1990 census for all counties and for categories of counties similar to the categories used in the 1990 census model evaluations described above (see Appendix B). The procedure to develop updated estimates for counties by age for 1990 was to ratio adjust the 1980 census county age estimates to 1990 county total population estimates and 1990 state age estimates. The overall average proportional absolute difference in the 1990 county estimates of the population aged 5-17 was 6.3 percent, unweighted by county population size, and 4.9 percent, weighted by size. By comparison, the overall average absolute difference in the 1990 county estimates of the total population was 3.6 percent unweighted and 2.3 percent weighted. Population size markedly affects the accuracy of the estimates for children aged 5-17. For counties with more than 1 million people in 1990, the average proportional absolute difference in the estimate for this age group was 5.2 per-

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations cent, but it was 12.4 percent for counties with fewer than 2,500 people. This relationship is expected given the likelihood that errors in the input data will be disproportionately greater for smaller counties than for larger counties. In terms of bias measured by the average proportional algebraic difference for categories of counties, the population estimates procedure, in comparison with the 1990 census, tends to overestimate children aged 5-17 in larger counties relative to smaller counties and in metropolitan counties relative to nonmetropolitan counties. The estimation procedure also tends to underestimate children aged 5-17 in counties with larger percentages of group quarters residents, to overestimate children aged 5-17 in counties with larger percentages of blacks, and to underestimate children aged 5-17 in counties with larger percentages of Hispanics relative to other counties. However, the differences are small for each characteristic. The issue in the context of Title I allocations is the extent to which differences from the census in the population estimates for children aged 5-17 affect the estimates of the proportion of poor school-age children from log number models (a, b), or how they affect the estimates of the number of poor school-age children from log rate models (c, d). In the aggregate, the use of population estimates to convert estimated numbers from log number models to estimated proportions adds about 1 percentage point to the overall average proportional absolute difference between the model estimates for 1989 and the 1990 census estimates (compare column 3 with column 2 of Table 4-2 for the two log number models). The use of population estimates to convert estimated proportions from log rate models to estimated numbers has even less effect overall (compare column 2 with column 3 of Table 4-2 for the two log rate models). In addition, although a rigorous analysis was not done, there seems to be little systematic contribution of errors in the population estimates to category differences in the model estimates of poor school-age children from the 1990 census estimates (see Appendix D). For the three single-equation rate models that were examined for 1989 in the first round of evaluations, including the log rate (under 21) model (c), the use of population estimates instead of 1990 census estimates (''true values'') to convert estimated proportions to estimated numbers of poor school-age children worsened the performance of the models for some characteristics (e.g., by increasing the spread between the largest negative and positive category differences compared with the census), improved their performance for other characteristics, and made essentially no difference for other characteristics. None of the category differences between the model estimates of poor school-age children developed with population estimates and those developed with 1990 census estimates was large. The evaluations of the effects of the population estimates on estimates of poor school-age children relate to a 10-year period: the population estimates for 1990 were developed on the basis of 1980 census data updated with other sources. The 1994 population estimates that are used to convert estimated numbers to

OCR for page 33
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations estimated proportions of poor school-age children in 1993 from the log number (under 18) model (b) were developed on the basis of 1990 census data. Because of the 4-year instead of 10-year period for updating, it is likely that errors in the 1994 population estimates are smaller than errors in the 1990 population estimates and that they have even smaller effects on the estimates of the number and proportion of poor school-age children.