The Census Bureau's procedure for developing updated county estimates of poor school-age children in 1993 uses a county model, a separate state model, and county population estimates. All three components are important, and the panel considers all three in this report. However, the heart of the estimation procedure is the county model. The task of developing good estimates of poor school-age children from a county model is more challenging than the task of developing good state estimates of poor school-age children or good county estimates of total school-age children. Hence, the panel focused its evaluation efforts mainly on the county model.
In selecting a specific model for developing small-area poverty estimates that are to be used for such an important public purpose as allocating funds, it is important to compare the selected model to alternative models that may have specific advantages or that appear to be equally good. When the original county estimates of poor school-age children in 1993 were provided to the panel, the Census Bureau had not had time to undertake a thorough assessment of the performance of that model or to compare it to other models. Subsequently, the panel and the Census Bureau developed a range of alternative county models to evaluate. In a first round of evaluations, 12 models were examined. On the basis of the results of those evaluations, a second round of evaluations examined four models that appeared practicable to use to provide revised county estimates of poor school-age children in 1993. The basic features of the models that were examined are summarized below.^{1}
^{1} |
For technical information on the models included in the first round of evaluations, see Appendix A. The models specified do not exhaust the list of possibilities, but they are a reasonable range of alternatives to consider at the present time. See Chapter 6 for model formulations that could be considered as part of a longer term research program for small-area estimation. |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
3
Alternative County Models
The Census Bureau's procedure for developing updated county estimates of poor school-age children in 1993 uses a county model, a separate state model, and county population estimates. All three components are important, and the panel considers all three in this report. However, the heart of the estimation procedure is the county model. The task of developing good estimates of poor school-age children from a county model is more challenging than the task of developing good state estimates of poor school-age children or good county estimates of total school-age children. Hence, the panel focused its evaluation efforts mainly on the county model.
In selecting a specific model for developing small-area poverty estimates that are to be used for such an important public purpose as allocating funds, it is important to compare the selected model to alternative models that may have specific advantages or that appear to be equally good. When the original county estimates of poor school-age children in 1993 were provided to the panel, the Census Bureau had not had time to undertake a thorough assessment of the performance of that model or to compare it to other models. Subsequently, the panel and the Census Bureau developed a range of alternative county models to evaluate. In a first round of evaluations, 12 models were examined. On the basis of the results of those evaluations, a second round of evaluations examined four models that appeared practicable to use to provide revised county estimates of poor school-age children in 1993. The basic features of the models that were examined are summarized below.1
1
For technical information on the models included in the first round of evaluations, see Appendix A. The models specified do not exhaust the list of possibilities, but they are a reasonable range of alternatives to consider at the present time. See Chapter 6 for model formulations that could be considered as part of a longer term research program for small-area estimation.
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
MODEL CHARACTERISTICS
The alternative county models that the Census Bureau and the panel examined are distinguished broadly by three characteristics: (1) treatment of information from the previous census—whether the model includes a predictor variable from the previous census in a single equation or uses a bivariate formulation that links a census equation with a CPS equation; (2) the form of the variables—whether they are numbers or proportions, transformed to logarithms or untransformed; and (3) whether the model includes intercept terms for each state (i.e., fixed state effects).
Treatment of Information from the Previous Census The revised county model the Census Bureau used to produce estimates of poor school-age children in 1993 is a single-equation model in which the dependent variable is from the CPS and one of the predictor variables is the estimated number of poor school-age children from the previous census. The inclusion of the census predictor variable is based on the assumption that poverty in a prior year is indicative of poverty in a later year.
The state model makes use of information from the previous census in a different way. The state model equation, in which the dependent variable is also from the CPS, includes a predictor variable that is the estimated residual from a similar regression for the previous census. The underlying assumption is that states that had more (less) poverty than predicted for the census year will continue to have more (less) poverty for a later year than the model would predict without the residual variable. This assumption was supported by an analysis that showed the residuals to be correlated from a state model estimated with 1980 census data and a state model estimated with 1990 census data (see Chapter 2).
The possible advantage of having the county model include the estimated residual from an equation for the previous census could not be established because the necessary administrative data are not available with which to estimate a county equation from the 1980 census (for 1979). The Census Bureau developed a bivariate formulation of the county model as a way to make more complete use of information from the previous census in a manner analogous to the state model (Bell, 1997a). In the bivariate formulation, the 1993 county model jointly estimates two separate equations for March 1993-1995 CPS data and 1990 census data, respectively, in which the model errors of the two equations are allowed to be correlated (see below, ''Bivariate Models'').
Form of the Variables In the revised county model, the dependent variable is the log number of poor school-age children, and the predictor variables are also numbers that are transformed to logarithms. The Census Bureau and the panel examined alternative county models in which the dependent variable is the proportion, or rate, of poor school-age children. For some of these rate models, the
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
dependent and predictor variables are transformed to logarithms; for others, they are not transformed. Models for which the dependent and predictor variables are untransformed numbers were not considered because, when not transformed to logarithms, the distributions of the predictor variables at the county level have a wide range and are not symmetric; also, the predictor variables do not have linear relationships with the dependent variable. Untransformed poverty rates do not share these problems to the same extent, although it is possible to obtain predicted negative values from an untransformed formulation.
Inclusion of Fixed State Effects In the revised county model, there are no predictor variables that explicitly account for regional or state effects. After the initial county estimates are produced from the model, they are raked for consistency with the estimates from the state model. Analysis of the size and variability of the raking factors (see Chapter 4) suggested that the county model may not adequately account for differences among states in the relationship of the predictor variables to the dependent variable and, consequently, that the county model may not adequately account for the variation among counties within a state.
As a way to explore this problem, the Census Bureau developed a fixed state effects model. This model includes a dummy variable for each state, which is 1 for all counties in the state and 0 otherwise. The purpose of these state indicator variables is to enable the model to more accurately capture the variation among counties within each state by accounting for differences in the level of the dependent variable by state.
MODELS EXAMINED IN FIRST ROUND OF EVALUATIONS
Of the 12 models examined in the first round of evaluations, 6 were single-equation models, and 6 were bivariate models. Nine of the 12 models transform the values of the dependent and predictor variables into logarithms. Because logarithms cannot be taken for values of 0, these models are estimated only for the counties with sampled households in the CPS that contain at least one poor school-age child: 1,184 of 3,143 counties for the 1993 models. The other three models, which do not transform the variables (all three are rate models), use data for all counties with sampled households in the CPS: 1,488 counties for the 1993 models. A topic for future work is how to use all counties with CPS sampled households in estimating a log-based model (see Chapter 6).
Single-Equation Models
The basic form of a single-equation county model is
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
where:
yi = the dependent variable in county i (number or proportion of poor school-age children from 3 years of CPS data),
x1i . . . x5i = the predictor variables in county i,
ui = model error for county i, and
ei = sampling error of the dependent variable for county i.
The formulation with fixed state effects adds a dummy variable for each state, which is 1 for all counties in the state and 0 otherwise. The intercept term, α, is dropped from the models with fixed state effects to avoid overidentification. (The addition of a large number of dummy variables does not result in too few degrees of freedom because more than 1,000 counties are used to fit the regression coefficients.)
Six single-equation models were evaluated in the first round (see Table 3-1):
Log Number Model (Under 21) The dependent variable is the CPS estimate of the log number of poor school-age children, derived by multiplying for each county the 3-year weighted average poverty rate for related children aged 5-17 by the 3-year weighted average of total related children aged 5-17. The predictor variables are the number of child exemptions (assumed to be under age 21) reported by families in poverty on tax returns; the number of people receiving food stamps; the estimated population under age 21; the total number of child exemptions on tax returns; and the estimated number of poor school-age children in the 1990 census. For the 1993 model, the IRS and food stamp data pertain to 1993; the population estimates data pertain to 1994. All variables are transformed to logarithms. This is the original model used by the Census Bureau to produce 1993 county estimates of poor school-age children.
Log Number Model (Under 18) The dependent and predictor variables are the same as in (1), except that the estimated population under age 18 replaces the estimated population under age 21. This is the revised model used by the Census Bureau to produce 1993 county estimates of poor school-age children (see Chapter 2). It was included in the first round of evaluations after it became apparent that the log number model (under 21) was not performing well for counties with large numbers of people under age 21 in group quarters (see Chapter 4).
Log Number Model with Fixed State Effects The dependent and predictor variables are the same as in (1), with the addition of state indicator variables.
Log Rate Model (Under 21) The dependent variable is the CPS estimate of the log proportion poor, or log poverty rate, for school-age children: more
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
TABLE 3-1 Single-Equation County Models: Dependent Variable and Predictor Variables
Model
Dependent Variable, yi
Predictor Variables, x1i . . . x5i
Form of the Predictor Variables
(1) Log Number (Under 21)
Log 3-year weighted average number of poor school-age children
(1) Number of child exemptions reported by families in poverty on tax returns
Transformed to logarithms
(2) Number of people receiving food stamps
(3) Population under 21
(4) Number of child exemptions on tax returns
(5) Number of poor school-age children in 1990 census
(2) Log Number (Under 18)
Log 3-year weighted average number of poor school-age children
(1) Number of child exemptions reported by families in poverty on tax returns
Transformed to logarithms
(2) Number of people receiving food stamps
(3) Population under 18
(4) Number of child exemptions on tax returns
(5) Number of poor school-age children in 1990 census
(3) Log Number (Under 21) with Fixed State Effects
Same as Log Number Under 21 (1)
Same as Log Number Under 21 (1) with the addition of state indicator variables
Transformed to logarithms
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
Model
Dependent Variable, yi
Predictor Variables, x1i . . . x5i
Form of the Predictor Variables
(4) Log Rate
Log poverty ratio for school-age children (3-year sum of poor related children 5-17 divided by 3-year sum of total CPS children 5-17)
(1) Ratio of number of child exemptions reported by families in poverty on tax returns to total number of child exemptions on tax returns
Transformed to logarithms
(2) Ratio of number of people receiving food stamps to total population
(3) Ratio of total number of child exemptions on tax returns to total population under 21
(4) Ratio of number of poor related children aged 5-17 to total number of related children aged 5-17 from the 1990 census
(5) Rate
Poverty ratio for school-age children (same as Log Rate (4), except untransformed)
Same as Log Rate (4), except untransformed
Untransformed
(6) Hybrid Rate-Number
Same as Log Rate (4)
Same as Log Number Under 21 (1)
Transformed to logarithms
NOTE: The models are estimated for 1993 from 3 years of CPS data (March 1993, 1994, and 1995, covering income in 1992, 1993, and 1994).
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
precisely, a poverty ratio—similar to the state model—in which for each county the numerator is the sum over 3 years of the estimated number of poor related children aged 5-17 and the denominator is the sum over 3 years of the estimated total number of CPS children aged 5-17. The predictor variables are also ratios: the ratio of the number of child exemptions reported by families in poverty on tax returns to the total number of child exemptions on tax returns; the ratio of the number of people receiving food stamps to the total population (all ages); the ratio of the total number of child exemptions on tax returns to the total population under age 21;2 and the ratio of the estimated number of poor related children aged 5-17 to the estimated total number of related children aged 5-17 from the 1990 census. All variables are transformed to logarithms.
Rate Model The dependent variable and predictor variables are the same as in (4), but all variables are ratios, untransformed.
Hybrid Log Rate-Number Model The dependent variable is the CPS estimate of the poverty ratio for poor school-age children as in (4); the predictor variables are the same as in (1), that is, they represent numbers, not ratios; and all variables are transformed to logarithms.
Each single-equation model was estimated for 1993, by averaging 3 years of CPS data (March 1993, 1994, and 1995, covering income years 1992, 1993, and 1994) to form the dependent variable. Each model was also estimated for 1989: for the dependent variable, by averaging 3 years of CPS data (March 1989, 1990, and 1991, covering income years 1988, 1989, and 1990); for the predictor variables, by using appropriate data from IRS and food stamp records for 1989, 1990 population estimates of school-age children, and 1980 census estimates of poor school-age children. The 1989 models were estimated to permit comparisons with 1990 census estimates of poor school-age children in 1989 for evaluation purposes (see Chapter 4). Finally, each single-equation model was also estimated for 1989 by using 1990 census data rather than CPS data to form the dependent variable. The census equation was needed to determine how to distribute the total squared error of the CPS equation (1993 or 1989) into model error variance and sampling error variance (see Appendix A).
2
In 292 counties, the ratio of total child exemptions on tax returns to the total population under age 21—the tax filer population ratio—is greater than 1, which means that the nonfiler ratio (1 minus the filer ratio) is negative. Because negative values cannot be transformed into logarithms, the log rate equation includes the filer ratio and not the nonfiler ratio. There are several reasons that filer milos may be greater than 1: addresses on tax returns are not always the county of residence as defined for population estimates; tax filers may report exemptions for children who do not reside with them; and some child exemptions are for children aged 21 or older.
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
Bivariate Models
The bivariate formulation of the county model for 1993 estimates of poor school-age children involves the joint estimation of two equations: one for 1993, in which the dependent variable is formed by averaging 3 years of CPS data, and one for 1989, in which the dependent variable is formed by using 1990 census data. The bivariate formulation allows for a correlation between the model errors in the two equations—uCPS; and uCENi in equations (2) and (3) below (see also Appendix A). It is through this mechanism that data from the previous census are incorporated in predicting the number of poor school-age children in 1993. Hence, the bivariate models do not include 1990 census estimates of poor school-age children as a predictor variable in the 1993 equation.
The basic form of the CPS equation in the bivariate formulation is
where:
yCPSi = the dependent variable in county i (number or proportion of poor school-age children from 3 years of CPS data),
yCPS1i . . . xCPS4i = the predictor variables in county i,
uCPSi = model error for county i, and
eCPSi = sampling error of yCPSi for county i.
The basic form of the census equation in the bivariate formulation is
where:
yCENi = the dependent variable in county i (number or proportion of poor school-age children from the 1990 census),
xCEN1i . . . xCEN4i = the predictor variables in county i,
uCENi = model error for county i, and
eCENi = sampling error of yCENi for county i.
The formulation with fixed state effects adds a dummy variable for each state, which is 1 for all counties in the state and 0 otherwise.
Six bivariate models were evaluated in the first round (see Table 3-2):
Bivariate Log Number Model In the CPS equation for this bivariate model, the dependent variable is the same as in model (1), the single-equation log number model (under 21). The predictor variables are the same as in (1), except
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
TABLE 3-2 Bivariate County Models: Dependent Variable, Predictor Variables, and Form of the Predictor Variables for the CPS Equation for 1993
Model
Dependent Variable, yCPSi
Predictor Variables, xCPS1i . . . xCPS4i
Form of the Predictor Variables
(7) Log Number (Under 21)
Log 3-year weighted average number of poor school-age children (same as single-equation Log Number Under 21 (1))
(1) Number of child exemptions reported by families in poverty on tax returns
Transformed to Logarithms
(2) Number of people receiving food stamps
(3) Population under 21
(4) Number of child exemptions on tax returns (same as single-equation Log Number Under 21, except there is no previous census variable)
(8) Log Rate
Log poverty ratio for school-age children (3-year sum of poor related children 5-17 divided by 3-year sum of total CPS children 5-17 (same as single-equation Log Rate (4))
(1) Ratio of number of child exemptions reported by families in poverty on tax returns to total number of child exemptions on tax returns
Transformed to Logarithms
(2) Ratio of number of people receiving food stamps to total population
(3) Ratio of total number of child exemptions on tax returns to total population under 21 (same as single-equation Log Rate, except there is no previous census variable)
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
Model
Dependent Variable, yCPSi
Predictor Variables, xCPS1i . . . xCPS4i
Form of the Predictor Variables
(9) Rate
Poverty ratio for school-age children (same as Bivariate Log Rate (8), except untransformed)
Same as Bivariate Log Rate, except untransformed
Untransformed
(10) Log Number with Fixed State Effects
Same as Bivariate Log Number Under 21 (7)
Same as Bivariate Log Number Under 21 with the addition of state indicator variables
Transformed to Logarithms
(11) Log Rate with Fixed State Effects
Same as Bivariate Log Rate (8)
Same as Bivariate Log Rate with the addition of state indicator variables
Transformed to Logarithms
(12) Rate with Fixed State Effects
Poverty ratio for school-age children (same as Bivariate Log Rate (8), except untransformed)
Same as Bivariate Log Rate with the addition of state indicator variables, except untransformed
Untransformed
NOTES: The models are estimated for 1993 from a CPS equation for 1993 and a 1990 census equation for 1989. The census equation for 1989 for each bivariate model is of the same form as the corresponding CPS equation for 1993. The 1989 equations use the number of poor school-age children or the poverty ratio for school-age children from the 1990 census as the dependent variable; the predictor variables are from IRS and food stamp records for 1989 and population estimates from the 1990 census.
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
that the 1990 census estimated number of poor school-age children is dropped from the equation. In the census equation for this bivariate model, the dependent variable is the 1990 census estimate of the number of poor school-age children in 1989; the predictor variables are the same as in the CPS equation, except that the IRS and food stamp data pertain to 1989 instead of 1993, and the population data are from the 1990 census rather than from the population estimates program. All variables are transformed to logarithms.
Bivariate Log Rate Model In the CPS equation, the dependent variable is the same as in model (4), the single-equation log rate model (under 21). The predictor variables are the same as in (4), except that the 1990 census estimated poverty rate for school-age children is dropped from the equation. In the 1990 census equation, the dependent variable is the estimated log poverty ratio for school-age children from the census; the predictor variables are the same as in the CPS equation, except that the IRS and food stamp data pertain to 1989 instead of 1993 and the population data are from the 1990 census rather than from the population estimates program. All variables are ratios, transformed to logarithms.
Bivariate Rate Model The dependent and predictor variables in the CPS and census equations are the same as in (8), but all variables are ratios, untransformed.
Bivariate Log Number Model with Fixed State Effects The dependent and predictor variables in the CPS and census equations are the same as in (7), with the addition of state indicator variables in each equation. All variables are transformed to logarithms.
Bivariate Log Rate Model with Fixed State Effects The dependent and predictor variables in the CPS and census equations are the same as in (8), with the addition of state indicator variables in each equation. All variables are ratios, transformed to logarithms.
Bivariate Rate Model with Fixed State Effects The dependent and predictor variables in the CPS and census equations are the same as in (9), with the addition of state indicator variables in each equation. All variables are ratios, untransformed.
MODELS EXAMINED IN SECOND ROUND OF EVALUATIONS
The first round of evaluations included an internal evaluation, in which the regression output for all 12 models was examined to assess the validity of the
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
underlying assumptions (see Appendix C). It also included an external evaluation, in which estimates of poor school-age children in 1989 from the six single-equation models were compared with 1990 census estimates (see Appendix D). The results of these evaluations led the Census Bureau and the panel to drop several models from further consideration at this time.
The untransformed rate model (5) and the hybrid log rate-number model (6) were dropped from consideration because they performed somewhat worse, on balance, than the other models on both the internal and external evaluations. For example, in the comparisons of model estimates of poor school-age children in 1989 with 1990 census estimates, models (5) and (6) exhibited the largest overall absolute differences of their estimates from the census (see Table D-3). Also, the standardized residuals (differences between the model predictions and the reported values for each observation) from the regression equations for models (5) and (6) were not distributed normally.
The bivariate formulation (models 7-12) is promising in that it makes fuller use of the information from the previous census than the single-equation formulation. However, there is less experience with bivariate modeling than with modeling that uses a single equation for the kinds of estimates that are needed. More important, because the IRS and food stamp predictor variables at the county level are not available for 1979, it is not possible to evaluate bivariate models by comparison with estimates from the 1990 census. (Such a model would require joint estimation of a 1989 equation in which CPS data form the dependent variable and a 1979 equation in which 1980 census data form the dependent variable.) Hence, the bivariate formulation was not pursued for use at this time. However, further development of bivariate and multivariate models, which might include CPS equations for more than 1 year, as well as a census equation, is worth pursuing for the longer run (see Chapter 6).
Evaluation results indicated that the county model would likely benefit from taking account of state effects in some way. The addition of state indicator variables to either a single-equation or bivariate model (3, 10-12) was promising in some respects, but a fixed state effects approach did not seem clearly superior to other models that were examined. There was no time to investigate other approaches to account for state effects, although the panel believes that the county model could be improved in this regard in the near term with more research (see Chapter 6).
At the conclusion of the first round of evaluations, the Census Bureau and the panel focused on four models that were considered serious candidates to produce revised 1993 county estimates of poor school-age children. These four candidate models were then evaluated on several criteria. All four models are of the single-equation form with variables transformed to logarithms and without fixed state effects:
OCR for page 20
Small-Area Estimates of School-Age Children in Poverty - Interim Report 2: Evaluation of Revised 1993 County Estimates for Title I Allocations
Log number model (under 21), model (1) above, used by the Census Bureau to produce the original 1993 county estimates of poor school-age children.
Log number model (under 18), model (2) above. This model is the same as model (a) except that the population under age 18 replaces the population under age 21 as a predictor variable.
Log rate model (under 21), model (4) above. The rate formulation is used in the Census Bureau's state model, and the panel believed that, in log form, it could improve the county model.
Log rate model (under 18). This model is the same as model (c) except that the ratio of total child exemptions on tax returns to the total population under 18 replaces the ratio of total child exemptions on tax returns to the total population under age 21 as a predictor variable. The panel wanted to determine if this modification would improve the log rate model, since a similar modification improved the log number model. However, for reasons that are not clear, this modification to the log rate model worsened rather than improved its performance in several respects (see Chapter 4).
The model that the Census Bureau used to prepare the revised 1993 county estimates of poor school-age children is (b)—log number model (under 18), estimated with maximum likelihood. Chapter 4 describes the evaluations that were conducted of the four candidate models (a-d) and highlights key results. Appendix C analyzes the regression output for the 12 models that were included in the first round of evaluations and model (d). Appendix D provides 1990 census evaluation results for the six single-equation models that were included in the first round of evaluations and the four candidate models that were evaluated in the second round.