Read "Small-Area Estimates of School-Age Children in Poverty: Interim Report 3" at NAP.edu

« Previous: 1 Introduction

Page 11 Cite

Suggested Citation:"2 County Estimates." National Research Council. 1999. Small-Area Estimates of School-Age Children in Poverty: Interim Report 3. Washington, DC: The National Academies Press. doi: 10.17226/6427.

Page 12 Cite

Page 13 Cite

Page 14 Cite

Page 15 Cite

Page 16 Cite

Page 17 Cite

Page 18 Cite

Page 19 Cite

Page 20 Cite

Page 21 Cite

Page 22 Cite

Page 23 Cite

Page 24 Cite

Page 25 Cite

Page 26 Cite

Page 27 Cite

Page 28 Cite

Page 29 Cite

Page 30 Cite

Page 31 Cite

Page 32 Cite

Page 33 Cite

Page 34 Cite

Page 35 Cite

Page 36 Cite

Page 37 Cite

Page 38 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

County Estimates Reliance on the most recent decennial census to allocate federal funds to counties and other small areas has primarily reflected the absence of alternative data sources with comparable or superior reliability. Mindful of the need for small-area estimates that are more up to date than census estimates, the Census Bureau organized the Small Area Income and Poverty Estimates (SAIPE) Pro- gram to develop methods for producing postcensal income and poverty estimates for states and counties by using multiple data sources and innovative statistical methods. The program began in late 1992 with financial support from a consor- tium of five federal agencies. Congress made this work more urgent by passing legislation in 1994 that charged the Census Bureau to produce updated estimates of poor school-age children for counties and school districts every 2 years, to begin in 1996 with estimates for counties, discussed in this chapter, and in 1998 with estimates for school districts, discussed in Chapter 3. The SAIPE Program faces a challenging task to produce county-level esti- mates. For Title I allocations, there is no single administrative or survey data source that provides sufficient information with which to develop reliable direct estimates of the number and proportion of school-age children in families in poverty by county. The March Income Supplement to the Current Population Survey (CPS) can provide reasonably reliable annual direct estimates of such population characteristics as the number and proportion of poor children at the national level and possibly for the largest states. However, the CPS cannot provide direct estimates for the majority of counties because the sample does not include any households in them. And for almost all of the counties with house- holds in the CPS sample (about 1,250 of a total of 3,143 counties in 1995), the 1 7

2 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY estimates have a high degree of sampling variability. ~ Nonetheless, the CPS data may serve as the basis for creating usable estimates for counties through the application of statistical estimation techniques to develop "model-based" or "in- direct" estimates. Model-based or indirect estimators use data from several areas, time periods, or data sources (which could include the previous census) to "borrow strength" and improve the precision of estimates for small areas. A model-based approach is needed when there is no single data source for the area and time period in question that can provide direct estimates that are sufficiently reliable for the intended purpose. The Census Bureau has used this strategy to develop estimates of median family income for states (Fay et al., 1993) and, in part, to develop population estimates for states and counties (see Spencer and Lee, 1980~. This chapter provides a summary description and evaluation of the model- based approach used by the Census Bureau to develop estimates by county of the number and proportion of school-age children in families in 1996 who were poor in 1995 (referred to as the 1995 county estimates). A document prepared by the Census Bureau describes the estimation procedure and evaluations of the 1995 estimates in detail (Bureau of the Census, 1998; see also National Research Council, 1998:Chs. 3, 4, Apps. C, D on the evaluations of the 1993 estimates). If the Department of Education uses the Census Bureau's 1995 school dis- trict estimates of poor school-age children for direct allocation of Title I funds to districts, the 1995 county estimates will not be used directly. However, the 1995 county estimates are critical to the development of 1995 school district estimates. As a result of the lack of data at the school-district level, the Census Bureau has been constrained to use for school districts a very simple model-based method referred to as synthetic estimation, which applies the shares of poor school-age children for the school districts in a county according to the 1990 census to the updated 1995 county estimates to obtain updated school district estimates (see Chapter 3~.2 Therefore, in order to evaluate the 1995 school district estimates, it is essential to understand and evaluate the 1995 county estimates. {For a description of the March CPS and differences between income and poverty data from the CPS and the 1990 census long-form sample, see National Research Council (1997:Ch. 2; App. B). The 1990 census sample includes households in all counties and covers 15 million households, 300 times more than the 50,000 households in the CPS, yet even the 1990 census estimates are relatively variable for some small counties (National Research Council, 1997:Table 2-1). 2We use the term "synthetic estimation" for the Census Bureau's shares procedure for school district estimates and distinguish it from the statistical regression modeling that was done for the state and county estimates. However, synthetic estimation is sometimes used more broadly in the small-area literature.

COUNTY ESTIMATES 13 ESTIMATION PROCEDURE The Census Bureau's estimation procedure for counties uses two regression models that predict poor school-age children a county model and a separate state model. The estimation procedure was first used to develop the 1993 county estimates. It includes the following steps summarized below: (1) a regression model is developed to provide initial estimates of the number of poor school-age children at the county level; (2) a state model is developed to produce estimates of the number of poor school-age children by state; and (3) the initial county- level estimates are adjusted so that the final estimates for counties within each state sum to the state-level estimates. In addition, the Census Bureau produces county population estimates of the total number of school-age children, which the Department of Education has used to calculate estimated proportions of poor school-age children for counties. Finally, the Census Bureau produces separate estimates of poor school-age children for Puerto Rico. Step 1: County Model The first step in the estimation process is to develop and apply the Census Bureau' s county model to produce initial estimates of the numbers of poor school- age children. This step involves: obtaining data from the March CPS for three consecutive years to con struct a dependent variable in a county model regression equation that is the estimated log number of poor school-age children for counties with households in the CPS sample; obtaining data from administrative records and other sources that are avail able for all counties to construct predictor variables for the regression equation; specifying and estimating the regression equation to relate the predictor variables to the dependent variable; and using the estimated regression coefficients from the equation and the pre dictor variables to develop estimates of poor school-age children for all counties. For counties with households in the CPS sample, the predictions from the model are then combined by a "shrinkage" procedure with the CPS direct estimates (on a logarithmic scale) for those counties. (The shrinkage procedure weights the two sets of estimates according to their relative precision; see Fay and Herriot [1979i, Ghosh and Rao [1994i, and Platek et al. [19871 on shrinkage methods.) The initial county estimates are then obtained by transforming the predictions from the logarithmic to the numeric scale. The county model equation takes the following form: Hi = 0c + Mali + 02X2i + 03X3i + 04X4i + 05X5i+ Ui + ei (1)

14 where: SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY log(3-year weighted average of poor school-age children in county i), x~i = log~number of child exemptions reported by families in poverty on tax returns in county i), log~number of people receiving food stamps in county i), log~estimated population under age 18 in county i), log~number of child exemptions on tax returns in county i), log~number of poor school-age children in county i in the previous x2i X3i X4i X5i = census), ui = model error for county i, and ei = sampling error of the dependent variable for county i. The predictor variables in the county equation for the 1995 estimates are based on data from Internal Revenue Service (IRS) records for 1995 (oh, x4i), Food Stamp Program records for 1995 (x2i), the Census Bureau's population estimates program for 1996 (x3i), and the 1990 census (x5i).3 As the dependent or outcome variable, the county equation uses county estimates of the number of poor school-age children averaged over 3 years of the March CPS (data from the March 1995, 1996, and 1997 CPS, covering income in 1994, 1995, and 1996~.4 The relationships between the predictor variables and the dependent variable in equation (1) are estimated solely from the subset of counties that have house- holds in the March CPS sample. This subset includes proportionately more large counties and proportionately fewer small counties than the distribution of all counties. Because values of zero cannot be transformed into logarithms, a num- ber of counties whose sampled households contain no poor school-age children are excluded from the estimation. In all, 985 of the country's 3,143 counties were included in the 1995 model estimation. Step 2: State Model The second step in the estimation process is to develop and apply the Census Bureau's state model to produce estimates of the number of poor school-age children by state. The state estimation is similar to that for counties, although the state model differs from the county model in several respects.5 3Variables x3i and x4i are included in the model in order to cover children not reported on tax returns (i.e., in nonfiling families), who are assumed to be poorer on average than other children. 4see Bureau of the census (1998) and National Research Council (1998:Ch. 2) for the derivation of the 3-year weighted average of poor school-age children from the cPs and of the last two terms in the equation (Hi and ei). 5see National Research Council (1998:Ch. 2) for a detailed review of the forms of the state and county models and the differences between them.

COUNTY ESTIMATES The state model equation takes the following form: Hi = 0c + plXli + 02X2i + D3X3i + D4X4i + hi + e where: 15 (2) Hi = proportion of school-age children in state i that are poor, estimated from one year of the CPS (March 1996 CPS for the 1995 model), x~i = proportion of child exemptions reported by families in poverty on tax returns in state i, x2i = proportion of people receiving food stamps in state i, x3i = proportion of people under age 65 who did not file an income tax return in state i,6 x4i = residual for state i from a regression of the proportion of poor school- age children from the most recent decennial census on the other three predictor variables,7 ui = model error for state i, and ei = sampling error of the dependent variable for state All states have sampled households with poor school-age children in the CPS; however, the variability associated with estimates from the CPS is large for some states. As is done for the initial county estimates, the predictions from the state model and the CPS direct estimates are combined in a shrinkage procedure to produce estimates of the proportion of poor school-age children in each state. To produce estimates of the number of poor school-age children in each state, the estimates of the proportion poor are multiplied by estimates of the total number of noninstitutionalized school-age children from the Census Bureau's program of population estimates. Finally, the state estimates of the numbers of poor school- age children are adjusted to sum to the CPS national estimate of related school- age children in poverty. This adjustment is a minor one; for 1995 it changed the state estimates by less than one-half of 1 percent. Step 3: Combining the County and State Estimates The last step in the estimation process is to adjust the initial estimates of poor school-age children from the county model (step 1) for consistency by state with the estimates from the state model (step 2) to produce final estimates of the 6This percentage is obtained by subtracting the estimated number of exemptions on income tax returns for people under age 65 from the estimated total population under age 65 that is derived from demographic analysis (see National Research Council, 1998:App. B). 7For the 1995 state model, x4i is the residual from a regression of poor school-age children from the 1990 census on the other three predictor variables for 1989.

6 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY numbers of related children aged 5-17 in poverty by county. The estimate for each state from the state model is divided by the sum of the estimates for each county in that state to form a state raking factor. Each of the county estimates in a state is multiplied by the state raking factor so that the sum of the adjusted county estimates equals the state estimate. For the final county estimates of poor school-age children in 1995, the average state raking factor was 0.97; two-thirds of the factors were between 0.88 and 1.06. Differences Between 1995 and 1993 Estimation Procedures The procedure summarized above to produce the 1995 county estimates differs in a few respects from the procedure that was used to produce the revised 1993 estimates described in the panel's second interim report (National Research Council, 1998~. The changes involved the input data for the state and county models: . An error in processing the 1989 IRS data was discovered and corrected. The corrected data were used to reestimate the decennial census equation that provides the residual predictor variable in the 1995 state model (x4 i in equation (2~. The corrected data were also used to reestimate the 1989 state and county models for evaluation purposes. · Several changes were made to the food stamp data for input to the state model: instead of using data for July, the number of food stamp recipients was changed to a 12-month average centered on January 1 of the following year; counts by state of the numbers of people who received food stamps due to specific natural disasters were obtained from the Department of Agriculture and subtracted from the counts of the total number of recipients; time-series analysis of monthly state food stamp data from October 1979 through September 1997 was used to smooth outliers; and food stamp recipient data for Alaska and Hawaii were adjusted downward to reflect the higher eligibility thresholds for those states. · The food stamp numbers for the county model were raked to the adjusted state food stamp numbers. · In both the state and county models, child exemptions reported by fami- lies on tax returns were redefined to include children away from home in addition to children at home. This change may increase the number of IRS poor child exemptions in households with children away from home both because of the additional children and because poverty thresholds are higher for larger size families. Population Estimates To accompany county estimates of school-age children in 1996 who were in poor families in 1995, the Census Bureau produced county-level estimates of the

COUNTY ESTIMATES 17 total number of children aged 5-17 for 1996 from its demographic population estimates program. The estimates from step 3 above and the population estimates can then be used to calculate estimated proportions of poor school-age children for counties. The Census Bureau also produced county-level estimates of total population for 1996. The population estimates pertain to July of the year follow- ing the one for which poverty status is estimated. A detailed description and evaluation of the Census Bureau's population estimates procedures for counties is provided in National Research Council (1998:App. B). Puerto Rico Estimates of poor school-age children for Puerto Rico, which is treated as a county equivalent in the allocation formula, are developed separately. The county model cannot be used for them because there are no precise equivalents for Puerto Rico of tax return and food stamp data to form predictor variables for the model. The original estimates for Puerto Rico of school-age children in 1994 who were poor in 1993 were developed with data from an experimental March 1995 income survey modeled after the CPS March Income Supplement, together with data from the decennial census and updated population estimates. These data sources required a number of adjustments for several reasons: (1) the March 1995 experimental survey did not collect information on the ages of family members under 16 (so that related children aged 5-17 could not be identified among those aged under 18~; (2) the updated Puerto Rico population estimates were for all children in the resident population, not for related children only; and (3) the survey, which was conducted in 1995, obtained information on 1994, not 1993, income. In making the adjustments, the Census Bureau assumed that certain relationships observed in 1990 census data still applied and that the change in the number of Puerto Rico school-age children in poverty between 1989 and 1994 was linear. The sample size of the experimental survey of about 3,200 households ap- peared large enough to provide a direct estimate of the number of poor school-age children with adequate precision. However, only limited information was avail- able about other key aspects of data quality, including household response rates on the income questions and the editing or imputation procedures used. Hence, it was difficult to evaluate the quality of the 1993 estimates for Puerto Rico, al- though the estimation procedures seemed appropriate given the data available. The Puerto Rican Family Income Survey is now an ongoing survey, con- ducted at 2-year intervals. The Census Bureau used income data from the 1996 survey, in which about 2,300 households were interviewed in February-March 1997, together with decennial census data and updated population estimates for Puerto Rico, to construct estimates of school-age children in 1996 who were poor in 1995. The three adjustments that were made for the 1993 estimates were also

8 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY required. The change in the number of children in poor families between 1994 and 1996 was assumed to be linear. Additional information was obtained from Puerto Rico about the quality of the income survey, which in general, supported the use of the survey data to develop estimates of poor school-age children in the commonwealth (see Santos and Waddington, 1999~. EVALUATION The development of model-based estimates for small areas is a major, con- tinuing research and development effort for which extensive evaluation is re- quired. For updated estimates of poor school-age children for counties, a thor- ough assessment of all aspects of the estimation procedure is necessary to have confidence in the estimates whether the estimates are used by the Department of Education to allocate Title I funds to counties (as has been the practice up to now) or to develop estimates for school districts. Since there are no absolute criteria for what are acceptable evaluation results, one method for determining if the performance of a model can be improved is to examine alternative models. Such comparisons may indicate changes that would be helpful for a model; they may also suggest that an alternative model is prefer- able. As summarized above, the Census Bureau's county estimates of poor school-age children are produced by using a county regression model, a state regression model, and county population estimates developed with demographic analysis techniques. A comprehensive evaluation for each of these components of the estimation procedure should include "internal" and "external" evaluations. An internal evaluation is primarily an investigation of the validity of the underlying assumptions and features of a model. For a regression model, an internal validation is typically based on an examination of the residuals from the regression the differences between the predicted and reported values of the dependent variable for each observation. In an external evaluation, the estimates from a model are compared with target or "true" values that were not used to develop the model. Ideally, an internal evaluation of regression model output should precede external evaluation. Changes made to the model to address concerns raised by the internal evaluation would likely improve its performance in the external evaluation. Both internal and external evaluations should be carried out for alternative models. In its second interim report, the panel reviewed a series of internal and external evaluations that were conducted for the revised 1993 county estimates of poor school-age children (National Research Council, 1998:Ch. 4, Apps. B. C, D). The state model and the county population estimates were examined as well, both directly and as they contributed to the county estimates of poor school-age children. The evaluation determined that the revised procedure for developing updated county estimates, which principally involved a change in one of the

COUNTY ESTIMATES 19 predictor variables in the original county model,8 produced estimates for 1993 that were appropriate for use in allocating Title I funds to counties. Because the 1995 county estimates were developed by using a procedure similar to that used to develop the revised 1993 county estimates, the focus of the evaluation effort for the 1995 estimates shifted to how the state and county models behave over several time periods, and specifically, to determining whether there are persistent biases or other problems. The evaluations of the 1995 county estimates, which are described in this chapter, included: (1) internal evaluation of the regression output for the 1995 county model estimated for 1995, 1993, and 1989 (using uncorrected and corrected tax return data); (2) comparison of estimates of poor school-age children that were developed from the 1995 form of the county model for 1995, 1993, and 1989 with CPS estimates for groups of counties, a form of external evaluation; and (3) evaluation of the state model, including examination of regression output for 1996, 1995, 1993, 1992, 1991, 1990, and 1989 and consideration of the state raking factors by which county model estimates are adjusted to make them con- sistent with the state model estimates. County Model Internal Evaluations The first test of a regression model is that it perform well when evaluated internally, that is, for the set of observations for which it is estimated. The evaluation of the county regression output pertains to the regression model itself, that is, before the predictions are combined with the direct CPS estimates in a shrinkage procedure or raked to the estimates from the state model. The regres- sion output comprises the model predictions for counties that have at least one household with poor school-age children in the CPS sample. We first summarize the evaluation work done on the 1993 county model predictions and then detail the work on the 1995 county model predictions. 1993 Evaluation As part of the evaluation of the revised 1993 county estimates (National Research Council, 1998:Ch. 4 and App. C), the panel and the Census Bureau examined the underlying assumptions of 13 alternative county models through 8The predictor variable x3i in equation (1) was changed from the estimated population under age 21 to the estimated population under age 18. This change improved the model predictions, particu- larly for groups of counties classified by the percentage of group quarters residents (see National Research Council, 1998:Ch.2).

20 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY evaluation of the regression model output for 1989 and 1993. The models varied on three dimensions: treatment of information from the previous census (bivari- ate or single-equation), form of the variables (poverty rates or numbers, trans- formed to logarithms or not transformed), and whether the model included fixed state effects. Although an evaluation of the regression output would not likely provide conclusive evidence with which to rank the performance of alternative models, particularly when they use different transformations of the dependent variable, such an examination could help determine which models perform rea- sonably well. The assumptions examined included: . linearity of the relationships between the dependent variable and the pre- dictor variables, assessed by examining a variety of graphical plots; · constancy of the assumed linear relationship over different time periods, assessed through comparison of the regression coefficients on the predictor vari- ables for the years for which the model was estimated; · whether any of the included predictor variables are not needed in the model, evaluated by looking for insignificant t-statistics for the estimated values of individual regression coefficients, and, conversely, whether other potential predictor variables are needed in the model, evaluated by looking for nonrandom patterns, indicative of possible model bias, in the distributions of standardized residuals displayed for categories of counties;9 · normality (primarily symmetry and moderate tail length) of the distribu- tion of the standardized residuals; · whether the standardized residuals have homogeneous variances, that is, whether the variability of the standardized residuals is constant across counties and does not depend on the values of the predictor variables; and · absence of outliers. The analysis for the most part supported the assumptions for the 13 models that were examined; it did not strongly support one model over another. A few problems characterized all or most of the models. First, most models tended to 9The standardization of the residuals involved estimating the predicted standard errors of the residuals, given the predictor variables, and dividing the observed residuals by the predicted standard errors. The predicted standard error of the residual for a county is a function of the estimated model error variance and the estimated sampling error variance (see Belsley et al., 1980). The categories of counties were specified in terms of: census region, census geographic division, metropolitan status of county, population size in 1990, population growth from 1980 to 1990, per- centage of poor school-age children in 1980, percentage of Hispanic population in 1990, percentage of black population in 1990, persistent poverty from 1960 to 1990 for rural counties, economic type for rural counties, percentage of group quarters residents in 1990, and number of households in the CPS sample.

COUNTY ESTIMATES 21 overpredict the number of poor school-age children in larger urban counties, especially those with large percentages of Hispanics. Second, all models showed evidence of some variance heterogeneity, particularly with respect to CPS sample size and often with respect to the predicted value (number or proportion of poor school-age children). Some of the models exhibited more problems with skew- ness and outliers than others. Finally, according to the internal evaluation, none of the other models was clearly superior to the revised Census Bureau 1993 county model. 1995 Evaluation The internal evaluation for the 1995 county model focused on comparisons of the properties of the model when estimated for different time periods. The analysis looked in particular at three characteristics: the constancy of the regres- sion coefficients on the predictor variables over time; distributions (box plots) of the standardized residuals for categories of counties to determine if there were any nonrandom patterns that persisted over time; and the phenomenon observed in the 1993 evaluations by which the variance of the standardized residuals was related to CPS sample size and the predicted value of the dependent variable (variance heterogeneity). Constancy of the Regression Coefficients Because the county model is refitted for each prediction year, constancy of the regression coefficients for the predictor variables over time is not as important as it would be if the estimated regression coefficients from the model were used for predictions for subsequent years. Also, major changes in economic conditions would be expected to cause some changes in the coefficients. Nonetheless, it is desirable for the coefficients to be in the same direction and not fluctuate wildly in size over time. Table 2-1 shows the regression coefficients for the predictor variables for the 1995 county model estimated for 1995 and 1993 and for 1989 with corrected IRS data and with original (uncorrected) IRS data.l° The coefficients for the three "poverty level" predictor variables child exemptions reported by families in poverty on tax returns (column 1), food stamp recipients (column 2), and poor school-age children from the previous census (column 5) are fairly similar in the equations for all three time periods. There are more substantial differences across the three time periods in the size of the estimated coefficients for the other two variables-population under age 18 (column 3) and total number of child exemptions on tax returns (column 4~. However, the sum of these two coeffi 1OThe regressions for 1995 and for 1989 with corrected IRS data also used modified food stamp data (i.e., the county food stamp data were raked to the adjusted state food stamp data, as described above).

22 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY TABLE 2-1 Estimates of Regression Coefficients for Census Bureau 1995 County Model, Estimated for 1989, 1993, and 1995 Predictor Variablesa No. of Year Counties (1) (2) (3)(4) (5) 1989 (revised IRS data) 1,028 0.52 0.29 1.55-1.56 0.26 (.06) (.06) (.31)(-30) (.06) 1989 (original IRS data) 1,028 0.50 0.23 1.79-1.80 0.32 (~06) (.05) (.27)(~27) (.0 1993 1,184 0.38 0.27 0.65-0.59 0.34 (~08) (.07) (.24)(~24) (.09) 1995 985 0.31 0.29 0.88-0.80 0.33 (.10) (.08) (.25)(~25) (.09) NOTE: All predictor variables are on the logarithmic scale for numbers. Standard errors of the estimated regression coefficients are in parentheses. aPredictor variables: (1) number of child exemptions reported by families in poverty on tax returns; (2) number of people receiving food stamps; (3) population under age 18; (4) total number of child exemptions on tax returns; (5) number of poor school-age children from previous (1980 or 1990) census. cients is close to zero in each year. Because the two variables are highly posi- tively correlated and close in magnitude, the predictions from equations with a similar sum for the two coefficients will be similar. Finally, the sum of all the coefficients is close to 1 for all 3 estimation years: 1.01 for 1995, 1.05 for 1993, and 1.06 for 1989 with the revised IRS data. It is desirable for the coefficients in a model of this form to sum to 1, which indicates that the model predictions do not vary by the scale of the predictor variables. If the sum of the coefficients is much greater than or less than 1, the model should be examined to determine if additional predictor variables or other changes in the model may be needed. Patterns of Residuals Given typical random variation, it is likely that the distributions of standardized residuals will display apparently nonrandom pat- terns for some categories of counties in a particular year. However, if the distri- butions display the same patterns across years, it is evidence of model bias. The persistence of the same patterns should be investigated to determine ways to eliminate or reduce the bias, for example, by adding a variable to the equation. (There are ample degrees of freedom in the county model to permit the inclusion of additional predictor variables.) Investigation of the standardized residuals for categories of counties for the

COUNTY ESTIMATES 23 county model estimated for 1995, 1993, and 1989 reveals little evidence of per- sistent bias. However, there is some suggestion that the model tends to consis- tently overpredict the number of poor school-age children in smaller size counties (i.e., the model estimates are somewhat higher than the CPS direct estimates for smaller counties). It also tends to overpredict the number of poor school-age children in counties that are in metropolitan areas but are not the central county in the area. These patterns, while not strong, are evident in the regression output for all 3 years Variance Heterogeneity The regression output for the 1995 county model clearly demonstrates variability in the size of the absolute standardized residuals as a function of the predicted value (number of poor school-age children) and the CPS sample size. If the variance estimates for the model are correct, then the standardized residual variance should remain constant over the distribution of CPS sample size, but it increases with increasing CPS sample size. This phenom- enon was evident in the evaluations conducted for the 1993 county model, and it is evident in all 3 years for which the 1995 county model was estimated. Adjusting a model to remove this type of heterogeneity is likely to have only a small effect on the estimated regression coefficients or the model estimates (although it will affect the estimated confidence intervals around the model esti- mates). The effect on estimates of the number of poor school-age children would stem from two factors: a shift in the weights assigned to each county in fitting the regression model, which would very likely result in only a modest change in the estimated regression coefficients; and a change in the weight given to the direct estimates, which could have an appreciable effect on the estimates only for the few counties with large CPS sample sizes. Nonetheless, it is clear that the current method for estimating the variance of the sampling errors (ei in equation (1~) in the county model is incorrect. The current approach essentially obtains the total sampling error variance by estimat- ing the total squared error for the model and subtracting from that estimate the estimated model error variance from a 1989 equation in which 1990 census data form the dependent variable. The total sampling error variance is then distributed to counties by assuming that the sampling error variance in a county is inversely proportional to the county's CPS sample size. The Census Bureau is investigating an alternative approach that would esti- mate the CPS sampling variances for larger counties on the basis of direct calcu- lations of these variances, which take account of the clustered sample design within these counties, and then develop a generalized variance function for mod- eling the sampling variances by using the directly estimated variances as a depen- dent variable. The variance of the model error (ui in equation 1) would then be calculated by subtracting the sampling variance from the total squared error, thus avoiding the questionable assumption that the model variances for the 1989 cen- sus equation and the CPS equations are equal.

24 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY The Census Bureau should pursue its work on a generalized variance func- tion for the county model. The Bureau should also investigate the use of some function other than proportionality to the inverse of sample size to distribute the total sampling error variance to counties to eliminate the pattern of an increase in the standardized residual variance with increases in CPS sample size. (The Census Bureau is currently examining the assumption that the sampling error variances at the county level are inversely proportional to the square root of the county's CPS sample size.) The effects of changes in the estimates of sampling error variance and model variance on the estimates of poor school-age children can then be assessed. Summary The panel concludes that the analysis of the regression output for the 1995 county model estimated for 1989, 1993, and 1995 largely supports the assumptions of the model: there is little evidence of important problems with the assumptions. However, the model does exhibit a few minor problems that appear to persist over time. First, it tends to overpredict the number of poor school-age children in smaller counties and metropolitan counties that are not the central county compared with other counties. The differences are not marked, but research should be conducted to determine possible ways to modify the model to eliminate or reduce this problem. Second, the model shows evidence of variance heterogeneity with respect to both CPS sample size and poverty rate. The function that is used to distribute the total sampling error variance to counties should be changed to eliminate or reduce this problem, while the Census Bureau pursues longer term research on direct estimates of CPS county-level sampling variances (see Chapter 5~. County Model External Evaluations Before using the estimates of a model for such important public policy purposes as allocating Title I funds, it is important to perform as much external evaluation of the estimates as is possible, with target values that were not used to develop the model. We first briefly review the external evaluations that were conducted for alternative 1993 county models, estimated for 1989 and 1993, and then summarize some additional external evaluations that were conducted for the 1995 county model estimated for three time periods. 1993 Evaluations 1990 Census Comparisons As part of the evaluation of the revised 1993 county estimates, the panel and the Census Bureau compared the estimated num- bers and proportions of poor school-age children for 1989 for seven alternative models with 1990 census estimates (National Research Council, 1998:Ch. 4 and

COUNTY ESTIMATES 25 App. D).ll The 1990 census comparisons were also performed for four simpler procedures that relied much more heavily on 1980 census estimates, such as a procedure that assumed the same distribution of poverty among counties within a state in 1989 as was found in the 1980 census. The evaluation examined the overall difference between the estimates from a model or procedure and the census and the differences for groups of counties categorized by various charac- teristics.l2 It addressed the accuracy of model estimates for the prediction year, that is, for 1989; it did not address the issue that model-based estimates may be used for Title I allocations for a school year several years after the prediction year.l3 The 1990 census estimates that were used in the model-census comparisons were ratio adjusted by a constant factor to make the census-based national esti- mate of poor school-age children equal the 1989 CPS national estimate. This adjustment removed the difference of about 5 percent between the CPS and census estimates of total poor school-age children for 1989. Consequently, the differences between a model and the 1990 census in estimating poor school-age children for groups of counties can be interpreted as differences in shares. This feature is useful because the Title I allocation formula distributes funding as shares (percentages) of a fixed dollar amount. External evaluation by comparison with the 1990 census is not ideal because the census estimates are not true values: they are affected by sampling variability and population undercount, and the census measurement of poverty differs from the CPS measurement in ways that are not fully understood (see National Re- search Council, 1997:Ch. 2, App. B; see also the Census Bureau's web site: http: //www.census.gov/hhes/www/saipe93/inputs/cencpsdf.html). In addition, there 1lThe county estimates reflect the effects of the state model and the county population estimates, as well as the county regression model, but the differences in model performance vis-a-vis the census in the evaluation are due to the particular form of the county model. Fewer models were evaluated externally by comparison with the 1990 census than were included in the internal evaluation of regression diagnostics (7 versus 13 models); lack of data prevented estimating the bivariate model formulations for 1989. The models for which the 1990 census comparisons were performed were estimated with the method of moments. Maximum likelihood is used to estimate the 1995 county model. The differ- ences in the estimates from the two techniques are small. 12The categories were specified in terms of: census geographic division, metropolitan status of county, population size in 1990, population growth from 1980 to 1990, percentage of poor school- age children in 1980, percentage of Hispanic population in 1990, percentage of black population in 1990, persistent poverty from 1960 to 1990 for rural counties, economic type for rural counties, percentage of group quarters residents in 1990, whether the county had households in the CPS sample in 1989-1991, and percentage change in the poverty rate for school-age children from 1980 to 1990. 13Research should be conducted to reduce the time lag between the prediction year for model- based estimates and the year for Title I allocations to the extent possible (see Chapter 5).

26 SMALL-AREA ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY is only one census-based validation opportunity, 1990. Because of the lack of IRS and Food Stamp Program data for counties for 1979, it is not possible to evaluate model-based estimates by comparison to the 1980 census. Reliance on a single validation using the 1990 census is a problem because a model may perform better or worse in any one validation than it would on average when used over multiple years. In the absence of other means of external validation, how- ever, the panel and the Census Bureau relied heavily on the 1990 census compari- sons to understand the performance of alternative models. The 1990 census comparisons produced a large volume of statistics and assessments (see National Research Council, 1998:Ch.4~. From its examination, the panel concluded that the models that were tested performed better than the simpler procedures that were tested. The models exhibited smaller overall abso- lute differences of their estimates of poor school-age children from the census estimates than did the simpler procedures. Also, for most categories of counties, the algebraic differences between the model-based estimates and the census esti- mates were smaller and exhibited fewer obvious patterns across categories than did the differences for the simpler procedures. Comparing alternative models, the panel found that there were some county characteristics for which some or all models exhibited poor performance in terms of the spread between the largest and smallest algebraic category differences, the pattern of the differences across categories, or the size of the differences. Even on these characteristics, the models generally performed better than the simpler procedures. There were also some characteristics for which all models per- formed well. Of the alternative models, the panel concluded that, on balance, the revised 1993 county model performed somewhat better than the other models. The only problems evident for this model were that it tended to overpredict the number of poor school-age children in counties that experienced the greatest decline in the poverty rate for school-age children from 1980 to 1990, counties with large percentages of Hispanic residents, and counties in the Mountain and Pacific Divisions. Also, it tended to underpredict the number of poor school-age chil- dren in counties that experienced the greatest increase in the poverty rate for school-age children from 1980 to 1990. One would not expect any model to perform particularly well for the coun- ties that experienced the largest changes (increase or decrease) in the poverty rate for school-age children from 1980 to 1990. This variable is closely related to the variable that the models are trying to estimate, and any regression model can only partially predict which cases will have the most extreme values of the outcome variable. The overprediction for counties in the West Region (Pacific and Moun- tain Divisions), given that the county estimates are raked to the state estimates from the Census Bureau's state model, must be attributable to the state model. Yet the evaluations showed that the state raking procedure improved the esti

COUNTY ESTIMATES 27 mates for counties categorized by geographic division in comparison with a procedure that made no adjustments by state. LocalAssessment of 1993 County Estimates The panel performed another type of external evaluation of the county estimates of poor school-age children- the use of local knowledge. Using the original 1993 model estimates for all 3,143 counties in the United States, the analysis first sought to identify groups of counties for which the 1993 estimates seemed unusually high or low in relation to prior levels and trends (e.g., from 1980 to 1990) in the number and proportion of poor school-age children and known social and economic trends for these groups of counties. Then, local people-including staff and members of local councils of government, economic development authorities, welfare agencies, state demo- graphic units, state data centers, and other agencies were contacted to obtain their assessment of the reasonableness of the implied trends in poverty for school- age children given their knowledge of local socioeconomic conditions.l4 Individuals with local knowledge expressed a great deal of concern about the statistical reliability of the original 1993 county estimates, which was mostly consistent with the Census Bureau's own cautions in this regard, coupled with specific county estimates that seemed on the basis of local knowledge to be doubtful. These concerns notwithstanding, no categories of counties were iden- tified that experienced apparent trends in the number and proportion of poor school-age children between 1989 and 1993 that were not accepted by knowl- edgeable local people. The trends for a few counties were not accepted locally, but the analysis found no strong indicators of potential bias for groups of counties sharing common characteristics in the county model. 1995 Evaluations For the 1995 county model external evaluations, the emphasis shifted to finding a way to look for persistent bias. An apparent bias identified in a single validation, such as the 1990 census comparisons summarized above, may be a one-time effect that will not occur in other years for which a model is estimated. For any particular year, it is almost inevitable that the differences between the model estimates and target values will be somewhat larger for some categories of counties than others. But if such differences persist for the same categories of counties over time, some areas may continually receive more funding than if the true values were known, and other areas may continually receive less funding. 14This evaluation was carried out at the University of Wisconsin-Madison by Dr. Paul Voss, a member of the panel, with the assistance of Richard Gibson and Kathleen Morgen (see Voss et al., 1997). The evaluation used the original 1993 county estimates because the revised estimates were not available at the time.

28 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY As a type of external validation by which the issue of persistent bias could be examined, the panel and the Census Bureau compared estimates of poor school- age children from the 1995 county model for categories of counties for 1989, 1993, and 1995 with CPS direct estimates for those categories for the three periods. Three years of CPS data were used to form the weighted estimates in each case in order to reduce the sampling variability.~5 Table 2-2 shows the difference in the number of poor school-age children from the county model, estimated for 1989 (using corrected IRS data), 1993, and 1995, and the weighted 3-year CPS direct estimates centered on those years for categories of counties. The measure shown is the algebraic difference by cat- egory, which is the sum for all counties in a category of the algebraic (signed) difference between the model estimate of poor school-age children and the weighted CPS direct estimate, divided by the sum of the weighted CPS direct estimates for the category. Comparisons with weighted CPS direct estimates have the advantage over comparisons with the census that they can be performed for multiple years. They have the disadvantage that the sample sizes for CPS estimates, even aggregated for 3 years, are small for many categories of counties, thus making the compari- sons much more uncertain than the 1990 census comparisons because of the much greater variability in the standard of comparison. Also, in analyzing the CPS comparisons, one must keep in mind that the model estimates are raked to the state estimates, which are developed from a single year of the CPS. The model-CPS aggregate differences in Table 2-2 differ widely among categories of counties, in large part because of the small sample sizes for the CPS estimates, even when aggregated for 3 years. Some of the differences are very large, larger than any of the differences seen in the model-l990 census compari- sons (see National Research Council, 1998:Table 4-3, column b). Generally, the larger model-CPS aggregate differences are for categories of counties with smaller numbers of CPS sample households. For example, the model-CPS aggregate differences often exceed 5 percent for counties grouped into the nine geographic divisions, but they are all less than 5 percent for counties grouped into the four geographic regions. In addition, the model-CPS aggregate differences for 1989 frequently differ from the model-l990 census differences. This finding is expected, given that the measurement of poverty differs between the census and the CPS because of the many differences in data collection procedures. i5This analysis is not the same as the analysis of regression output described above, in which the standardized residuals from the model for counties with sampled households in the CPS-represent- ing the standardized differences between the model estimates and the direct estimates on the log scale-were examined for categories of counties. i6For future evaluations of this type, the census Bureau should develop estimates of the standard errors of the differences so that significant differences between the model estimates and the cPs 3- year aggregate estimates can be identified.

COUNTY ESTIMATES TABLE 2-2 Comparison of County Model Estimates with CPS Aggregate Estimates of the Number of Poor School-Age Children, 1995, 1993, and 1989: Algebraic Difference by Category of County (in percent) 29 Model-Model- Model-Sample No. ofCPS,CPS, CPS,Size, CPS Countiesal 995bl 993b l 989bl 996c Category(1)(2)(3) (4)(5) Census Regiond Northeast217-2.870.81 -4.3610,708 Midwest1,055-0.490.61 -4.3111,393 South1,4254.05-0.13 4.4815,440 West444-4.16-0.95 -0.4312,141 Census Divisiond New England67-13.511.87 27.073,696 Middle Atlantic1500.050.54 -9.797,012 East North Central437-6.10-0.64 -3.046,841 West North Central61818.314.25 -7.444,552 South Atlantic5911.820.83 4.128,150 East South Central364-5.53-5.85 9.322,529 West South Central47012.001.90 2.444,761 Mountain281-3.9119.87 0.845,543 Pacific163-4.24-6.48 -0.926,598 Metropolitan Status Central county of metropolitan area493-2.75-0.91 -3.5334,343 Other metropolitan25453.75-3.64 8.442,801 Nonmetropolitan2,3941.243.50 8.3212,538 1990 Population Size Under 7,500525-17.2157.03 0.74933 7,500-14,99963019.82-23.67 -0.191,550 15,000-24,9995242.946.24 17.022,289 25,000-49,99962030.46-0.23 -4.464,204 50,000-99,999384-2.524.99 22.475,979 100,000-249,99925917.2712.12 -3.888,263 250,000 or more199-7.24-2.49 -3.1026,464 1980 to 1990 Population Growth Decrease of more than 10.0%444-2.71-22.03 -4.292,170 Decrease of 0.1-10.0%972-4.312.44 -1.3210,655 0.0-4.9%5476.043.41 3.188,015 5.0-14.9%6201.125.97 4.6111,590 15.0-24.9%260-0.07-4.11 -10.449,305 25.0% or more292-0.52-2.27 10.317,947 continued

30 TABLE 2-2 Continued SMALL-AREA ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY Model- Model- Model-Sample No. ofCPS, CPS, CPS,Size, CPS Countiesal 995b l 993b l 989bl 996c Category(1)(2) (3) (4)(5) Percentage of Poor School-Age Children, 1980 Less than 9.4%5162.74 7.22 -1.0714,980 9.4-11.6%5241.39 5.28 4.3512,291 11.7-14.1%530-10.01 -6.49 -6.729,837 14.2-17.2%5231.28 -5.82 0.445,217 17.3-22.3%5199.32 17.41 0.234,623 22.4-53.0%5231.05 -14.81 4.112,734 Percentage Hispanic, 1990 0.0-0.9%1,7701.26 -0.75 3.1312,848 1.0-4.9%8479.33 1.45 4.3216,966 5.0-9.9%193-2.81 17.24 6.386,999 10.0-24.9%181-4.02 -5.14 -8.297,236 25.0-98.0%150-7.90 -3.29 -5.265,633 Percentage Black, 1990 0.0-0.9%1,4468.32 8.02 5.0910,929 1.0-4.9%6157.41 1.04 -1.8310,630 5.0-9.9%2945.41 -2.07 0.958,646 10.0-24.9%381-4.89 -0.75 3.5113,437 25.0-87.0%405-6.85 -2.82 -6.306,040 Persistent Rural Poverty, 1960- l 99oe Rural, not poor1,740-2.62 1.53 5.479,734 Rural, poor53522.45 -0.15 14.811,698 Not classified866-1.28 -0.28 -2.6838,250 Economic Type, Rural Countiese Farming556-24.56 -29.31 -12.411,634 Mining14646.97 27.59 40.67901 Manufacturing506-7.10 -3.58 -1.512,369 Government243120.13 27.59 59.391,661 Services323-12.18 -12.42 -11.862,760 Nonspecialized4846.99 18.35 23.892,018 Not classified883-1.18 -0.20 -2.5938,339 Percentage of Group Quarters Residents, 1990 Less than 1.0%5453.32 22.03 16.603,494 1.0-4.9%2,187-1.58 -1.27 -1.8441,648 5.0-9.9%29911.90 -1.22 4.513,980 10.0-41.0%11049.44 -6.28 17.02560

COUNTY ESTIMATES TABLE 2-2 Continued 31 Model- Model- Model- Sample No. of CPS, CPS, CPS, Size, CPS Countiesa l 995b l 993b l 989b l 996c Category (1) (2) (3) (4) (5) Change in Poverty Rate for School-Age Children, 1980- 1990 Decrease of more than 3.0% 536 -3.88 -11.16 -10.04 4,038 Decrease of 0.1-3.0% 649 -4.57 2.63 4.44 12,658 0.0-0.9% 272 2.16 -2.75 9.66 5,102 1.0-3.4% 621 -1.07 0.11 -5.06 14,660 3.5-6.4% 532 9.09 -2.60 -0.66 7,507 6.5-38.0% 523 -1.07 5.17 3.98 5,719 a3,141 counties are assigned to a category for most characteristics; 3,135 counties are assigned to a category for 1980-1990 population growth and 1980 percentage of poor school-age children; 3,133 counties are assigned to a category for 1980-1990 percent change in poverty rate for school-age children. bThe formula, where there are n counties (i) in category (1), Ymode1 is the estimated number of poor school-age children from the county model, and YCps is the estimated number of poor school- age children from a 3-year weighted average of the CPS, is Hi (Ymode! id-YCPS id) / ~iYCPS id CNumber of households (unweighted) in the sample for the March 1996 CPS is shown to give an idea of the relative sample sizes for each category. The 3-year weighted averages are based on 3 years' worth of sample, although some sample cases are the same for 2 years because of the rota- tional design. dCensus region and division states: Northeast New England: Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut Middle Atlantic: New York, New Jersey, Pennsylvania Midwest East North Central: Ohio, Indiana, Illinois, Michigan, Wisconsin West North Central: Missouri, Minnesota, Iowa, North Dakota, South Dakota, Nebraska, Kansas South West South Atlantic: Delaware, Maryland, District of Columbia, Virginia, West Virginia, North Carolina, South Carolina, Georgia, Florida East South Central: Kentucky, Tennessee, Alabama, Mississippi West South Central: Arkansas, Louisiana, Oklahoma, Texas Mountain: Montana, Idaho, Wyoming, Colorado, New Mexico, Arizona, Utah, Nevada Pacific: Washington, Oregon, California, Alaska, Hawaii eThe Economic Research Service, U.S. Department of Agriculture, classifies rural counties by 1960-1990 poverty status and economic type. Counties not classified are urban counties and rural counties for which a classification could not be made. SOURCE: Data from Bureau of the Census.

32 SMALL-AREA ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY Despite the sample size limitations, Table 2-2 can inform an assessment of the performance of the county model if the results are used with caution. Of particular interest are instances in which the model-CPS aggregate differences are both large and in the same direction (plus or minus) for all 3 years for which the county model is estimated. Such findings suggest a possible systematic bias in the model that should be investigated to determine the nature of the bias and what steps could be taken to eliminate or reduce it (e.g., by adding a predictor variable to the model). Several persistent patterns are evident in the model-CPS aggregate differences: · The model shows a tendency to underpredict the number of poor school- age children in the largest counties, those with 250,000 or more population. This finding is consistent with the results from analyzing the distribution of the stan- dardized residuals from the regression output. The extent of the underprediction is not large, but it appears to be significant given the large number of CPS households in the largest counties. · The model shows a tendency to underpredict the number of poor school- age children in counties with large percentages of Hispanic residents (10% or more). There is a similar, although less pronounced, tendency for the model to underpredict the number of poor school-age children in counties with large per- centages of blacks. It is likely that counties with large percentages of Hispanics or blacks are not homogeneous (e.g., large-percentage black counties include both inner-city and rural areas). Hence, further research is needed to determine whether the underprediction is more or less pronounced for particular subgroups of these counties and, consequently, what steps are appropriate to ameliorate the bias in the model. · The model estimates are consistently very different from the weighted CPS estimates for some categories of rural counties classified by economic type. In particular, the model estimates for rural counties characterized as government are much higher than the corresponding weighted CPS estimates. Although the comparisons by economic type are based on small CPS sample sizes, it seems worthwhile to examine some of these counties to see if a reason for these large differences can be found. · Finally, the model shows a tendency to underpredict the number of poor school-age children in counties that experienced the largest declines in the pov- erty rate for school-age children from 1980 to 1990. As was noted above, this finding is consistent with the knowledge that any regression model can only partially predict which cases will have the most extreme values of the outcome variable. Summary Considering both the external evaluations of alternative models that were conducted for the revised 1993 county model and the external evaluations of 3

COUNTY ESTIMATES 33 years of estimates that were conducted for the 1995 county model, the panel concludes that the county model is working reasonably well. However, further investigation is needed of categories of counties for which the model appears to overpredict or underpredict the number of poor school-age children, particularly when that phenomenon is evident for several periods. State Model Evaluation The state model plays an important role in the production of county estimates of poor school-age children. Evaluations conducted of the state model for the assessment of the revised 1993 county estimates included an internal evaluation of the regression output for 1989 and 1993 and an external evaluation that com- pared 1989 estimates from the model with 1990 census estimates of proportions of poor school-age children. The results in each case supported the use of the model. However, the state model evaluations were more limited than the county model evaluations, as alternative state model formulations were not evaluated explicitly. For the assessment of the 1995 county estimates, further evaluations were conducted of the state model. In particular, the model was estimated for 7 years 1989, 1990, 1991, 1992, 1993, 1995, and 1996 and the regression out- put for those years was examined to determine if there were any systematic biases in the model estimates. (The model was not estimated for 1994 because the redesign of the CPS sample, consequent to the 1990 census, was partly but not completely phased in for the March 1995 CPS.) Also, there was an evaluation of the state raking factors for 1993 and 1995. State Model Regression Output The state regression model is a poverty rate model with the variables not transformed (see equation (2~. The analysis of the regression output for the state model for 1989-1993 and 1995-1996 examined the same assumptions that were examined for the 1995 county model estimated for 1989, 1993, and 1995. The analysis is somewhat less informative for the state model than for the county model because there are about 1,000 counties with poor school-age children in the CPS, but only 51 states (including the District of Columbia), and states are collectively much more homogeneous than counties with respect to poverty rates and other characteristics. In addition, with respect to both internal and external evaluation, some categories of states do not contain enough states for analysis, thereby reducing the utility of evaluation. Nonetheless, examination of the regression output for the state model helps assess the validity of its assumptions. With a few exceptions, the analysis sup- ports the assumptions underlying the state model (see below); there is little evi

34 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY dence of significant problems with the model formulation (although there may be other models that fit just as well). Linearity Plots of standardized residuals against the four predictor vari- ables in the state model-the proportion of child exemptions reported by families in poverty on tax returns, the proportion of people receiving food stamps, the proportion of people under age 65 who did not file a tax return, and a residual from the analogous regression equation using the previous census estimate as the dependent variable-support the assumption of linearity. Furthermore, the stan- dardized residuals, when plotted against the model's predicted values, provide no evidence of the need for any transformation of the variables. This result helps justify the decision not to use the log transformation of the proportion poor as the dependent variable. Constancy over Time Table 2-3 shows the regression coefficients for the predictor variables for the state model for each of the years from 1989 tol996, excluding 1994. The coefficients for all four poverty-rate predictor variables are positive in all 7 years and generally similar across all years. All of the coeffi- cients are significant at the 5 percent level except that the coefficient of the proportion of people under age 65 who did not file a tax return (column 3) is not significant in 1989. Inclusion or Exclusion of Predictor Variables The standardized residuals for the state regression model were grouped into four categories for each of the following characteristics: census region, population size in 1990, 1980 to 1990 population growth, percentage of black population in 1990, percentage of His- panic population in 1990, percentage of group quarters residents in 1990, and percentage of poor school-age children in 1979 (from the 1980 census). The distributions of the standardized residuals for each category were then displayed using box plots. For none of these box plots is there an obvious pattern to the standardized residuals across categories, with one exception: in 1989, 1990, 1991, and 1993 the model underpredicts the proportion poor of school-age chil- dren in the West Region (i.e., the model estimates are lower than the CPS direct estimates for this group of states). The Census Bureau experimented with adding a West Region indicator predictor variable to the model. The coefficient of this variable has a negative sign for all 7 years; however, it is significant for only 1991, 1992, and 1993. For those 3 years, the model with the West Region vari- able performs better for states in the West Region. A further examination of the residuals from the state model without the West Region predictor variable for individual Western states reveals that the model fairly consistently under- predicts the proportion poor of school-age children in some Western states but just as consistently overpredicts the proportion poor of school-age children in other Western states. Further investigation is needed to explain these patterns.

COUNTY ESTIMATES TABLE 2-3 Estimates of Regression Coefficients for the 1995 State Model, Estimated for 1989-1993, and 1995-1996 Predictor Variablesa Year (1) (2) (3) (4) 1989 1990 1991 1992 1993 1995 1996 0.52 (.o9) 0.46 (.o9) 0.46 (.10) 0.41 (.10) 0.28 (.12) 0.57 (.12) 0.37 (.12) 0.71 (.20) 0.65 (.20) 0.52 (.21) 0.71 (.21) 1.14 (.25) 0.79 (.25) 0.97 (.26) 0.23 (.13) 0.42 (.15) 0.59 (.14) 0.42 (.13) 0.51 (.14) 0.32 (.13) 0.59 (.14) 0.71 (.34) 1.07 (.36) 0.84 (.37) 1.38 (.37) 1.24 (.39) 1.54 (.36) 1.02 (.36) NOTES: All predictor variables are in terms of rates. Standard errors of the estimated regression coefficients are in parentheses. aPredictor variables: (1) ratio of child exemptions reported by families in poverty on tax returns to total child exemptions; (2) ratio of people receiving food stamps to total population; (3) ratio of people under age 65 who did not file an income tax return to total population under age 65; (4) residual from a regression of poverty rates for school-age children from the prior decennial census (1980 or 1990) on the other three predictor variables. 35 Normality, Homogeneous Variances, and Outliers The distribution of the standardized residuals from the state regression model shows some small degree of skewness, especially in the 1992 equation. However, the skewness does not appear sufficiently marked to be a problem. Also, the residual plots and the box plots of the distributions of the standardized residuals against the categories of states show little evidence of any heterogenous variance. Finally, there is no evidence of outliers from examination of the residual plots or displays of the distributions of the standardized residuals from the state regression model. Model Error Variance One problem in the state model concerns the vari- ance of the model error (ui in equation (2~. In the state model, the variances of

36 SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY the sampling errors (ei in equation (2)) are estimated directly from the CPS data using a generalized variance function. The total model error variance is calcu- lated using maximum likelihood estimation. The result of this calculation is an estimate of zero for the model error variance in the equation for every year except 1993. This result, which implies (absent sampling variability) that the model gives perfect predictions of state poverty rates for school-age children, is not credible. It produces zero weight for the direct estimates even when those esti- mates are quite precise, as is the case for several large states in the CPS sample. Even a small model error variance can substantially change the weight on the relatively high-precision direct estimates when they are combined in a shrinkage procedure with the model estimates. To evaluate the effects of using zero model error variance in the estimation, the panel examined tables that compared the model estimates of the proportion poor of school-age children to the CPS direct estimates by state for 1989-1993 and 1995-1996; as an illustration, Table 2-4 shows this comparison for 1995. This examination demonstrated two important points. First, there are some ap- preciable differences between the model estimates and the direct estimates. For example, for Mississippi in 1995, the difference is over 7 percentage points. Therefore, if a non-zero estimate for model error variance is produced, it might have important consequences for the state estimates of poor school-age children. Second, while there are some appreciable differences, the model estimates were within two standard errors of the direct estimates for almost all states in each year. The range of model estimates that exceeded that limit in either a positive or negative direction was from one state in 1992 to six states in 1996. (Mississippi's difference in 1995 was not statistically significant at the 5 percent level.) For no single state did the model estimates exceed two standard errors of the direct estimates for more than 3 of the 7 years for which the state model was estimated. (And this analysis ignores the variance of the model estimates, which means that a yet smaller number of differences are statistically significant.) These results suggest that the state model is performing reasonably well: differences between model and direct estimates are neither unusually large nor strongly persistent. However, more work should be conducted to evaluate the current procedures for estimating the sampling error variance of the state model and the effects on the model estimates (see Chapter 5). State Raking Factors The final stage in producing updated estimates of the number of poor school- age children for counties is to rake the estimates from the county model for consistency with the estimates from the state model. The model-l990 census comparisons found that the raking procedure was beneficial to the county esti- mates. The raking factors vary considerably across states. For 1995, the raking

COUNTY ESTIMATES TABLE 2-4 CPS Direct Estimate and Regression Model Estimate of Percentage of School-Age Children in Poverty by State, 1995 37 Lower Upper Confidence Confidence State Bound on Direct Estimate (2) State CPS Direct Estimate (1) Bound on Direct Estimate (3) Model ~. Regression Estimate Minus Direct Regression Estimate Estimate (4) - (1) (4) (5) Alabama 22.2 16.5 27.9 23.4 1.2 Alaska 6.3 1.6 11.1 10.9 4.5 Arizona 23.0 16.8 29.2 21.1 -1.9 Arkansas 21.4 14.0 28.7 24.0 2.6 California 22.5 19.4 25.7 21.5 -1.0 Colorado 9.4 5.1 13.8 11.8 2.3 Connecticut 15.6 7.3 24.0 12.6 -3.0 Delaware 15.6 8.3 23.0 12.8 -2.8 District of Columbia 30.2 17.9 42.4 33.8 3.7 Florida 21.1 16.8 25.4 20.7 -0.4 Georgia 14.8 8.2 21.3 21.4 6.7 Hawaii 14.1 7.9 20.3 11.9 -2.2 Idaho 15.4 9.9 20.9 12.7 -2.7 Illinois 19.4 14.6 24.2 15.7 -3.7 Indiana 12.9 9.0 16.8 12.6 -0.4 Iowa 15.2 8.9 21.4 11.2 -3.9 Kansas 10.6 4.8 16.4 12.7 2.1 Kentucky 18.9 13.4 24.4 22.9 4.0 Louisiana 24.2 15.6 32.9 28.0 3.8 Maine 10.7 4.1 17.4 13.8 3.1 Maryland 12.8 5.0 20.5 11.5 -1.3 Massachusetts 16.5 11.5 21.5 13.3 -3.2 Michigan 14.2 10.0 18.3 17.2 3.0 Minnesota 9.5 5.5 13.4 10.0 0.6 Mississippi 34.9 25.6 44.3 27.4 -7.6 Missouri 9.4 3.5 15.2 17.0 7.7 Montana 17.4 9.4 25.3 18.4 1.0 Nebraska 11.4 7.1 15.7 10.0 -1.4 Nevada 9.8 4.0 15.6 11.8 2.0 New Hampshire 4.2 0.6 7.8 6.5 2.3 New Jersey 9.3 6.5 12.0 12.3 3.0 New Mexico 34.0 27.8 40.3 28.6 -5.5 New York 22.7 19.1 26.3 23.1 0.4 North Carolina 19.7 13.8 25.5 17.1 -2.6 North Dakota 10.3 5.3 15.2 14.1 3.8 Ohio 16.6 11.1 22.2 15.1 -1.5 Oklahoma 22.6 13.1 32.1 22.5 -0.1 Oregon 12.5 7.1 17.9 12.4 -0.1 Pennsylvania 16.1 12.5 19.7 15.3 -0.9 continued

38 TABLE 2-4 Continued SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY Regression Lower Upper Estimate Confidence Confidence State Minus CPS Bound on Bound on Model Direct Direct Direct Direct Regression Estimate Estimate Estimate Estimate Estimate (4) - (1) State (1) (2) (3) (4) (5) Rhode Island 16.4 10.7 22.2 15.1 -1.3 South Carolina 30.8 21.9 39.7 21.9 -8.9 South Dakota 16.7 8.7 24.8 17.3 0.6 Tennessee 18.4 9.1 27.7 18.7 0.3 Texas 22.4 19.3 25.5 24.3 1.9 Utah 7.3 3.9 10.8 7.5 0.2 Vermont 11.3 3.2 19.4 11.6 0.3 Virginia 14.3 7.6 21.1 14.5 0.1 Washington 15.8 7.9 23.7 12.4 -3.4 West Virginia 23.0 13.2 32.9 25.7 2.7 Wisconsin 11.1 4.0 18.1 12.2 1.2 Wyoming 10.5 6.3 14.7 12.2 1.7 NOTE: Confidence bounds are plus or minus two standard errors on the direct estimate (95% confidence interval, obtained using direct estimates of the CPS standard errors). SOURCE: Data from Bureau of the Census. factors range from 0.71 to 1.14 (two-thirds fall between 0.88 and 1.06~; for 1993, the raking factors range from 0.91 to 1.31 (two-thirds fall between 0.98 and 1.16~. The Census Bureau determined that the correlation between the raking fac- tors for states in 1993 and 1995 is low, which implies that there is little systematic variation by state across these years. Also, some variation in the raking factors is expected given the form of the county model and the need to transform the predicted log values of poor school-age children to estimated numbers before the raking is performed. Nonetheless, the degree of variation in the raking factors suggests (though there are better ways to diagnose this) that there may be state effects not captured in the county model, which, in turn, could affect the behavior of the model in estimating the number of poor school-age children for counties within states. Preliminary work conducted by the panel suggests that such state effects may be present (see Chapter 5~. The panel urges the Census Bureau to estimate the variance of the state raking factors to determine if the variability that they exhibit for 1993 and 1995 is consistent with random error. If it is not, the panel urges the Census Bureau to further investigate the state raking factors, including consideration of whether there is any feature of the state model that might explain the variation. More generally, the Census Bureau should conduct research on how to account for state effects in the county model.

Next: 3 School District Estimates »

Small-Area Estimates of School-Age Children in Poverty: Interim Report 3 (1999)

Chapter: 2 County Estimates

Welcome to OpenBook!

Get Email Updates