**T**he Census Bureau's procedure for developing updated county estimates of poor school-age children uses a county model, a separate state model, and county population estimates. All three components are important, but the heart of the estimation procedure is the county model. The task of developing good county estimates of poor school-age children is more difficult than the task of developing good state estimates of poor school-age children or good county estimates of total school-age children. Hence, the evaluation efforts of the panel and the Census Bureau focused mainly on the county model.

In selecting a specific model for developing small-area poverty estimates that are to be used for such an important public purpose as allocating funds, it is important to compare the selected model to competing models that may have specific advantages. When the original county estimates of poor school-age children in 1993 were released in early 1997, the Census Bureau had not had time to undertake a thorough assessment of the performance of the model used or to compare its performance to that of other models. Subsequently, the panel and the Census Bureau developed a range of alternative county models to evaluate. In a first round of evaluations, 12 models were examined. On the basis of the results of those evaluations, a second round of evaluations examined four models that appeared practicable to use in the SAIPE Program in the near term.

The basic features of the alternative models that were examined are summarized below. All of the models were estimated for 1993, and all except the

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
5
Alternative County Models
The Census Bureau's procedure for developing updated county estimates of poor school-age children uses a county model, a separate state model, and county population estimates. All three components are important, but the heart of the estimation procedure is the county model. The task of developing good county estimates of poor school-age children is more difficult than the task of developing good state estimates of poor school-age children or good county estimates of total school-age children. Hence, the evaluation efforts of the panel and the Census Bureau focused mainly on the county model.
In selecting a specific model for developing small-area poverty estimates that are to be used for such an important public purpose as allocating funds, it is important to compare the selected model to competing models that may have specific advantages. When the original county estimates of poor school-age children in 1993 were released in early 1997, the Census Bureau had not had time to undertake a thorough assessment of the performance of the model used or to compare its performance to that of other models. Subsequently, the panel and the Census Bureau developed a range of alternative county models to evaluate. In a first round of evaluations, 12 models were examined. On the basis of the results of those evaluations, a second round of evaluations examined four models that appeared practicable to use in the SAIPE Program in the near term.
The basic features of the alternative models that were examined are summarized below. All of the models were estimated for 1993, and all except the

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
bivariate models were estimated for 1989 to provide estimates for external evaluation by comparison with 1990 census estimates.1
MODEL CHARACTERISTICS
The alternative county models that the Census Bureau and the panel examined are distinguished broadly by three characteristics: (1) treatment of information from the previous census–whether the model includes a predictor variable from the previous census in a single equation or uses a bivariate formulation that links a census equation with a CPS equation; (2) the form of the variables– whether they are numbers or proportions, transformed to logarithms or untransformed; and (3) whether the model includes intercept terms for each state (i.e., fixed state effects).
Treatment of Information from the Previous Census The revised county model that the Census Bureau used to produce estimates of poor school-age children in 1993 and 1995 is a single-equation model in which the dependent variable is from the CPS and one of the predictor variables is the estimated number of poor school-age children from the previous census. The inclusion of the census predictor variable is based on the assumption that poverty in a prior year is indicative of poverty in a later year.
The state model makes use of information from the previous census in a different way. The state model equation, in which the dependent variable is also from the CPS, includes a predictor variable that is the estimated residual from a similar regression for the previous census. The underlying assumption is that states that had more (less) poverty than predicted for the census year will continue to have more (less) poverty for a later year than the model would predict without the residual variable. This assumption was supported by an analysis that showed the residuals from a state model estimated for 1979 with 1980 census data to be correlated with the residuals from a state model estimated for 1989 with 1990 census data.
The possible advantage of having the county model include the estimated residual from an equation for the previous census could not be established because the necessary administrative data are not available with which to estimate a county equation from the 1980 census (for 1979). As an alternative, the Census Bureau developed a bivariate formulation of the county model in order to make
1
For technical information on the models included in the first round of evaluations, see Appendix A. The models specified do not exhaust the list of possibilities, but they are a reasonable range of alternatives to consider at the present time. See Chapter 9 and National Research Council (2000:Ch.3) for model formulations that could be considered as part of a longer term research program for producing small-area poverty estimates.

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
more complete use of information from the previous census in a manner analogous to the state model (Bell, 1997a). In the bivariate formulation for 1993, the county model jointly estimates two regression equations, one that produces 1993 county estimates based on March 1993-1995 CPS data and the other that produces 1989 county estimates based on 1990 census data. This formulation incorporates information from the census by allowing the model errors in the two equations to be correlated (see below, “Bivariate Models”).
Form of the Variables In the revised county model, the dependent variable is the log number of poor school-age children, and the predictor variables are also numbers that are transformed to logarithms. The Census Bureau and the panel examined alternative county models in which the dependent variable is the proportion, or rate, of poor school-age children. For some of these rate models, the dependent and predictor variables are transformed to logarithms; for others, they are not transformed. Models for which the dependent and predictor variables are untransformed numbers were not considered because, when not transformed to logarithms, the distributions of the dependent and predictor variables at the county level have a wide range and are not symmetric; also, the predictor variables do not have linear relationships with the dependent variable. Untransformed poverty rates do not share these problems to the same extent, although it is possible to obtain predicted negative values from an untransformed formulation.
Inclusion of Fixed State Effects In the revised county model, there are no predictor variables that explicitly account for regional or state effects. After the county estimates are produced from the model, combined with the direct CPS estimates where applicable, and transformed to the numeric scale, they are raked for consistency with the estimates from the state model. Analysis of the size and variability of the raking factors (see Chapter 6) suggested that the county model may not adequately account for differences among states in the relationship of the predictor variables to the dependent variable and, consequently, that the county model may not adequately reflect the variation among counties within a state.
As a way to explore this problem, the Census Bureau developed a fixed state effects model by including an indicator, or dummy, variable for each state. The purpose of these state indicator variables is to enable the model to more accurately capture the variation among counties within each state by accounting for differences in the level of the dependent variable by state.
MODELS EXAMINED IN THE FIRST ROUND OF EVALUATIONS
Of the 12 models examined in the first round of evaluations, 6 were single-equation models, and 6 were bivariate models. Nine of the 12 models transform the values of the dependent and predictor variables into logarithms. Because logarithms cannot be taken for values of 0, these models are estimated only for

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
the counties with sampled households in the CPS that contain at least one poor school-age child: 1,184 of 3,143 counties for the 1993 models. The other three models, which do not transform the variables (all three are rate models), use data for all counties with sampled households in the CPS that contain at least one school-age child: 1,488 counties for the 1993 models. A topic for future work is how to use all counties with CPS-sampled households with school-age children in estimating a log-based model (see Chapter 9).
Single-Equation Models
The basic form of a single-equation county model is
zi = β0 + β1w1i + β2w2i . . . + β5w5i + vi + ai , (1)
where:
zi = the dependent variable in county i (log number or proportion of poor school-age children from 3 years of CPS data),
w1i . . . w5i = the predictor variables in county i,
vi = model error for county i, and
ai = sampling error of the dependent variable for county i.
The formulation with fixed state effects adds a set of indicator variables, one for each state. The indicator for a given state is 1 for all counties in that state and 0 otherwise. The intercept term, β0, is dropped from the models with fixed state effects to avoid overidentification. The addition of a large number of dummy variables does not result in too few degrees of freedom because more than 1,000 counties are used to fit the regression coefficients.
Six single-equation models were evaluated in the first round (see Table 5-1):
Log Number Model (Under 21) The dependent variable is the CPS estimate of the log number of poor school-age children, derived by multiplying for each county the 3-year weighted average poverty rate for related children aged 5-17 by the 3-year weighted average of total related children aged 5-17. The predictor variables, all of which are transformed to logarithms, are the number of child exemptions (assumed to be under age 21) reported by families in poverty on tax returns; the number of people receiving food stamps; the estimated population under age 21; the total number of child exemptions on tax returns; and the estimated number of poor school-age children in the 1990 census. For the 1993 model, the IRS and food stamp data pertain to 1993; the population estimates data pertain to 1994. This is the original model used by the Census Bureau to produce 1993 county estimates of poor school-age children.

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
TABLE 5-1 Single-Equation County Models: Dependent Variable and Predictor Variables
Model
Dependent Variable, zi
Predictor Variables, w1i . . . w5i
Form of the Predictor Variables
(1) Log Number (Under 21)
Log 3-year weighted average number of poor school-age children
Number of child exemptions reported by families in poverty on tax returns
Number of people receiving food stamps
Population under 21
Number of child exemptions on tax returns
Number of poor school-age children in 1990 census
Transformed to logarithms
(2) Log Number (Under 18)
Log 3-year weighted average number of poor school-age children
Number of child exemptions reported by families in poverty on tax returns
Number of people receiving food stamps
Population under 18
Number of child exemptions on tax returns
Number of poor school-age children in 1990 census
Transformed to logarithms
(3) Log Number (Under 21) with Fixed State Effects
Same as Log Number Under 21 (1)
Same as Log Number Under 21 (1) with the addition of state indicator variables
Transformed to logarithms

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
(4) Log Rate
Log poverty ratio for school-age children (3-year sum of poor related children aged 5-17 divided by 3-year sum of total CPS children aged 5-17)
Ratio of number of child exemptions reported by families in poverty on tax returns to total number of child exemptions on tax returns
Ratio of number of people receiving food stamps to total population
Ratio of total number of child exemptions on tax returns to total population under 21
Ratio of number of poor related children aged 5-17 to total number of related children aged 5-17 from the 1990 census
Transformed to logarithms
(5) Rate
Poverty ratio for school-age children (same as Log Rate (4), except untransformed)
Same as Log Rate (4), except untransformed
Untransformed
(6) Hybrid Rate-Number
Same as Log Rate (4)
Same as Log Number Under 21(1)
Transformed to logarithms
NOTE: The models are estimated for 1993 from 3 years of CPS data (March 1993, 1994, and 1995, covering income in 1992, 1993, and 1994).

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
Log Number Model (Under 18) The dependent and predictor variables are the same as in (1), except that the estimated population under age 18 replaces the estimated population under age 21. This is the revised model used by the Census Bureau to produce the revised 1993 and the 1995 county estimates of poor school-age children (see Chapter 4). It was included in the first round of evaluations after it became apparent that the log number model (under 21) was not performing well for counties with large numbers of people under age 21 in group quarters (see Chapter 6).
Log Number Model with Fixed State Effects The dependent and predictor variables are the same as in (1), with the addition of state indicator variables.
Log Rate Model (Under 21) The dependent variable is the CPS estimate of the log proportion poor, or log poverty rate, for school-age children: more precisely, a poverty ratio–similar to the state model–in which for each county the numerator is the sum over 3 years of the estimated number of poor related children aged 5-17 and the denominator is the sum over 3 years of the estimated total number of CPS children aged 5-17. The predictor variables are also ratios: the ratio of the number of child exemptions reported by families in poverty on tax returns to the total number of child exemptions on tax returns; the ratio of the number of people receiving food stamps to the total population (all ages); the ratio of the total number of child exemptions on tax returns to the total population under age 21;2 and the ratio of the estimated number of poor related children aged 5-17 to the estimated total number of related children aged 5-17 from the 1990 census. All variables are transformed to logarithms.
Rate Model The dependent variable and predictor variables are the same as in (4), but all variables are ratios, untransformed.
Hybrid Log Rate-Number Model The dependent variable is the CPS estimate of the poverty ratio for poor school-age children as in (4); the predictor variables are the same as in (1); that is, they represent numbers, not ratios; and all variables are transformed to logarithms.
2
In 292 counties, the ratio of total child exemptions on tax returns to the total noninstitutionalized population under age 21 in 1993 –the tax filer population ratio–is greater than 1, which means that the nonfiler ratio (1 minus the filer ratio) is negative. Because negative values cannot be transformed into logarithms, the log rate equation includes the filer ratio and not the nonfiler ratio. There are several reasons that filer ratios may be greater than 1: addresses on tax returns are not always the county of residence as defined for population estimates; tax filers may report exemptions for children who do not reside with them; and some child exemptions are for children aged 21 or older.

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
Each single-equation model was estimated for 1993 by averaging 3 years of CPS data (March 1993, 1994, and 1995, covering income years 1992, 1993, and 1994) to form the dependent variable. Each model was also estimated for 1989: for the dependent variable, by averaging 3 years of CPS data (March 1989, 1990, and 1991, covering income years 1988, 1989, and 1990); for the predictor variables, by using appropriate data from IRS and food stamp records for 1989, 1990 population estimates of school-age children, and 1980 census estimates of poor school-age children. The 1989 models were estimated to permit comparisons with 1990 census estimates of poor school-age children in 1989 for evaluation purposes (see Chapter 6). Finally, each single-equation model was also estimated for 1989 by using 1990 census data rather than CPS data to form the dependent variable. The census equation was used to estimate the model error variance of the 1993 and 1989 CPS equations (see Appendix A).
Bivariate Models
The bivariate formulation of the county model for 1993 estimates of poor school-age children involves the joint estimation of two equations: one for 1993, in which the dependent variable is formed by averaging 3 years of CPS data, and one for 1989, in which the dependent variable is formed by using 1990 census data. The bivariate formulation allows for a correlation between the model errors in the two equations (vCPSi and vCENi in equations (2) and (3) below; see also Appendix A). It is through this mechanism that data from the previous census are incorporated in predicting the number of poor school-age children in 1993. Hence, the bivariate models do not include 1990 census estimates of poor school-age children as a predictor variable in the 1993 equation. The bivariate models were not estimated for 1989: a CPS equation for 1989 could have been estimated but a census equation for 1979 could not be estimated because of lack of administrative records data to form predictor variables.
The basic form of the CPS equation in the bivariate formulation is
zCPSi = β0 + β1wCPS1i + β2wCPS2i . . . + β4wCPS4i + vCPSi + aCPSi , (2)
where:
zCPSi = the dependent variable in county i (log number or proportion of poor school-age children from 3 years of CPS data),
wCPS1i . . . wCPS4i = the predictor variables in county i,
vCPSi = model error for county i, and
aCPSi = sampling error of zCPSi for county i.
The basic form of the census equation in the bivariate formulation is

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
zCENi = β0* + β1* wCEN1i + β2* wCEN2i . . . + β4* wCEN4i + vCENi + aCENi , (3)
where:
zCENi = the dependent variable in county i (log number or proportion of poor school-age children from the 1990 census),
wCEN1i. . . wCEN4i = the Predictor variables in county i,
vCENi = model error for county i, and
aCENi = sampling error of zCENi for county i.
The formulation with fixed state effects adds an indicator variable for each state, which is 1 for all counties in the state and 0 otherwise.
Six bivariate models were evaluated in the first round (see Table 5-2):
Bivariate Log Number Model In the CPS equation for this bivariate model, the dependent variable is the same as in model (1), the single-equation log number model (under 21). The predictor variables are the same as in (1), except that the 1990 census estimated number of poor school-age children is dropped from the equation. In the census equation for this bivariate model, the dependent variable is the 1990 census estimate of the log number of poor school-age children in 1989; the predictor variables are the same as in the CPS equation, except that the IRS and food stamp data pertain to 1989 instead of 1993, and the population data are from the 1990 census rather than from the population estimates program. All variables are transformed to logarithms.
Bivariate Log Rate Model In the CPS equation, the dependent variable is the same as in model (4), the single-equation log rate model (under 21). The predictor variables are the same as in (4), except that the 1990 census estimated poverty rate for school-age children is dropped from the equation. In the 1990 census equation, the dependent variable is the estimated log poverty ratio for school-age children from the census; the predictor variables are the same as in the CPS equation, except that the IRS and food stamp data pertain to 1989 instead of 1993 and the population data are from the 1990 census rather than from the population estimates program. All variables are ratios, transformed to logarithms.
Bivariate Rate Model The dependent and predictor variables in the CPS and census equations are the same as in (8), but all variables are ratios, untransformed.
Bivariate Log Number Model with Fixed State Effects The dependent and predictor variables in the CPS and census equations are the same as in (7), with the addition of state indicator variables in each equation. All variables are transformed to logarithms.

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
Bivariate Log Rate Model with Fixed State Effects The dependent and predictor variables in the CPS and census equations are the same as in (8), with the addition of state indicator variables in each equation. All variables are ratios, transformed to logarithms.
Bivariate Rate Model with Fixed State Effects The dependent and predictor variables in the CPS and census equations are the same as in (9), with the addition of state indicator variables in each equation. All variables are ratios, untransformed.
MODELS EXAMINED IN THE SECOND ROUND OF EVALUATIONS
The first round of evaluations included an internal evaluation, in which the regression output for all 12 models was examined to assess the validity of the underlying assumptions (see Appendix B). It also included an external evaluation in which estimates of poor school-age children in 1989 from the six single-equation models were compared with 1990 census estimates (see Appendix C). The results of these evaluations led the Census Bureau and the panel to drop several models from further consideration in the near term.
The untransformed rate model (5) and the hybrid log rate-number model (6) were dropped from consideration because they performed somewhat worse, on balance, than the other models on both the internal and external evaluations. For example, in the comparisons of model estimates of poor school-age children in 1989 with 1990 census estimates, models (5) and (6) exhibited the largest overall absolute differences of their estimates from the census (see Table C-3). Also, the standardized residuals (differences between the model prediction and the reported value for each observation) from the regression equations for models (5) and (6) were not distributed normally.
The bivariate formulation (models 7-12) is promising in that it makes fuller use of the information from the previous census than the single-equation formulation. However, there is less experience with bivariate modeling than with modeling that uses a single equation for the kinds of estimates that are needed. More important, because the IRS and food stamp predictor variables at the county level were not available for 1979, it is not possible to evaluate bivariate models by comparison with estimates from the 1990 census. (Such a model would require joint estimation of a 1989 equation in which CPS data form the dependent variable and a 1979 equation in which 1980 census data form the dependent variable.) Hence, the bivariate formulation was not pursued for use in the short run. However, further development of bivariate and multivariate models, which might include CPS equations for more than 1 year, as well as a census equation, is worth pursuing for the longer run (see Chapter 9).
Evaluation results indicated that the county model would likely benefit from taking account of state effects in some way. The addition of state indicator

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
TABLE 5-2 Bivariate County Models: Dependent Variable, Predictor Variables, and Form of the Predictor Variables for the CPS Equation for 1993
Model
Dependent Variable, zCPSi
Predictor Variables, wCPS1i . . . wCPS4i
Form of the Predictor Variables
(7) Log Number (Under 21)
Log 3-year weighted average number of poor school-age children (same as single-equation Log Number Under 21 (1))
Number of child exemptions reported by families in poverty on tax returns
Number of people receiving food stamps
Population under 21
Number of child exemptions on tax returns (same as single-equation Log Number Under 21, except there is no previous census variable)
Transformed to Logarithms
(8) Log Rate
Log poverty ratio for school-age children (3-year sum of poor related children aged 5-17 divided by 3-year sum of total CPS children aged 5-17) (same as single-equation Log Rate (4))
Ratio of number of child exemptions reported by families in poverty on tax returns to total number of child exemptions on tax returns
Ratio of number of people receiving food stamps to total population
Ratio of total number of child exemptions on tax returns to total population under 21 (same as single-equation Log Rate, except there is no previous census variable)
Transformed to Logarithms

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
(9) Rate
Poverty ratio for school-age children (same as Bivariate Log Rate (8), except untransformed)
Same as Bivariate Log Rate, except untransformed
Untransformed
(10) Log Number with Fixed State Effects
Same as Bivariate Log Number Under 21 (7)
Same as Bivariate Log Number Under 21 with the addition of state indicator variables
Transformed to Logarithms
(11) Log Rate with Fixed State Effects
Same as Bivariate Log Rate (8)
Same as Bivariate Log Rate with the addition of state indicator variables
Transformed to Logarithms
(12) Rate with Fixed State Effects
Poverty ratio for school-age children (same as Bivariate Log Rate (8), except untransformed)
Same as Bivariate Log Rate with the addition of state indicator variables, except untransformed
Untransformed
NOTES: The models are estimated for 1993 from a CPS equation for 1993 and a 1990 census equation for 1989. The census equation for 1989 for each bivariate model is of the same form as the corresponding CPS equation for 1993. The 1989 equations use the number of poor school-age children or the poverty ratio for school-age children from the 1990 census as the dependent variable; the predictor variables are from IRS and food stamp records for 1989 and population estimates from the 1990 census.

OCR for page 44

Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology
variables to either a single-equation or bivariate model (3, 10-12) was promising in some respects, but a fixed state effects approach did not seem clearly superior to other models that were examined. There was not time to investigate other approaches to account for state effects, although the panel believes that the county model might be improved in this regard with more research (see Chapter 9).
At the conclusion of the first round of evaluations, the Census Bureau and the panel focused on four models that were considered serious candidates to produce revised 1993 county estimates of poor school-age children. These four candidate models were then evaluated on several criteria. All four models are of the single-equation form with variables transformed to logarithms and without fixed state effects:
Log number model (under 21), model (1) above, used by the Census Bureau to produce the original 1993 county estimates of poor school-age children.
Log number model (under 18), model (2) above. This model is the same as model (a) except that the population under age 18 replaces the population under age 21 as a predictor variable.
Log rate model (under 21), model (4) above. The rate formulation is used in the Census Bureau's state model, and the panel believed that, in log form, it might improve the county model.
Log rate model (under 18). This model is the same as model (c) except that the ratio of total child exemptions on tax returns to the total population under 18 replaces the ratio of total child exemptions on tax returns to the total population under age 21 as a predictor variable. The panel wanted to determine if this modification would improve the log rate model, since a similar modification had been found to improve the log number model. However, for reasons that are not clear, this modification to the log rate model worsened rather than improved its performance in several respects (see Chapter 6).
The model that the Census Bureau used to prepare the revised 1993 county estimates of poor school-age children is (b)–log number model (under 18), estimated with maximum likelihood. This model was also used to prepare the 1995 estimates. Chapter 6 describes the evaluations that were conducted of the four candidate models (a-d) and highlights key results. Appendix B analyzes the regression output for the 12 models that were included in the first round of evaluations and model (d). Appendix C provides 1990 census evaluation results for the six single-equation models that were included in the first round of evaluations and the four candidate models that were evaluated in the second round. Appendix C also compares the four candidate models with four other procedures that rely more heavily on census data. These procedures are described and discussed in Chapter 6.