4
Estimation Procedure for Counties
The task for the SAIPE Program of producing reasonably reliable and current countylevel estimates of poor schoolage children for Title I allocations is a challenging one. At present, no single administrative records or survey data source provides sufficient information with which to develop reliable direct county estimates of the numbers and proportions of poor schoolage children that are more up to date than census estimates. The March Income Supplement to the Current Population Survey (CPS) can provide reasonably reliable annual direct estimates of such population characteristics as the number and proportion of poor children at the national level and possibly for the largest states. However, the CPS cannot provide direct estimates for the majority of counties because the sample does not include any households in them. And for almost all of the counties with households in the CPS sample (1,274 of a total of 3,142 counties in 1995), the estimates have a high degree of sampling variability.^{1} Nonetheless, the CPS data may serve as the basis for creating usable estimates for counties through the application of statistical estimation techniques to develop “modelbased” or “indirect” estimates.
Modelbased or indirect estimators use data from several areas, time periods, or data sources to “borrow strength” and improve the precision of estimates for
^{1 } 
For a description of the March CPS and differences between income and poverty data from the CPS and the 1990 census longform sample, see Chapter 3. The 1990 census sample includes households in all counties and covers 15.7 million households, 30 times more than the 50,000 households in the CPS; even the 1990 census estimates are highly variable for small counties (Table 31). 
small areas. A modelbased approach is needed when there is no single data source for the area and time period in question that can provide direct estimates of sufficient reliability for the intended purpose. The Census Bureau has used this strategy to develop estimates of median family income for states (Fay, Nelson, and Litow, 1993). In the 1970s, it used modelbased methods to improve 1970 census smallarea income estimates for use in developing updated per capita income estimates for governmental jurisdictions (Fay and Herriot, 1979) and, in part, to develop population estimates for states and counties (see Spencer and Lee, 1980).
This chapter provides a summary description and evaluation of the modelbased approach used by the Census Bureau to develop estimates by county of the numbers and proportions of schoolage children in families in 1996 who were poor in 1995 (referred to as the 1995 county estimates). The estimation procedure involves the use of separate county and state regression models.^{2} The chapter also summarizes differences between the state and county models used to develop the 1995 county estimates and the models used to develop the original and revised 1993 county estimates. Additional detailed documentation for the 1993 state and county models (and alternative models) is provided in Appendix A; see also Bell et al. (2000). For the county model, see also Coder, Fisher, and Siegel (1996) and Fisher (1997); for the state model, see also Fay (1996) and Fay and Train (1997). The Census Bureau' s web site (www.census.gov/hhes/www/saipe.html) provides an overview of the estimation procedures and contains a number of papers on the SAIPE methods.
When the Department of Education uses the Census Bureau's school district estimates of poor schoolage children for direct allocation of Title I funds to districts, county estimates are not used directly in the allocation process. However, the county estimates are critical to the development of school district estimates. As a result of the lack of data at the schooldistrict level, the Census Bureau is constrained to use for school districts a very simple modelbased method referred to as a shares method, which, for 1995 estimates, applied the shares or proportions of poor schoolage children for the school districts in a county according to the 1990 census to the updated 1995 county estimates to obtain updated school district estimates (see Chapter 7). Therefore, in order to evaluate the 1995 school district estimates, it is essential to understand and evaluate the 1995 county estimates.
^{2 } 
The panel's final report provides a more mathematical presentation of the development of these models (National Research Council, 2000:Ch. 3). 
1995 ESTIMATION PROCEDURE
The Census Bureau's 1995 estimation procedure for counties includes the following steps:

Developing and applying the Census Bureau's county model to produce initial estimates of the numbers of poor schoolage children. The county estimation process involves:

obtaining data from administrative records and other sources that are available for all counties to use as predictor variables;

specifying and estimating a regression equation that relates the predictor variables to a dependent variable, which is the estimated log number of poor schoolage children from 3 years of the March CPS for counties with households with poor schoolage children in the CPS sample; and

using the estimated regression coefficients from the equation and the predictor variables to develop estimates of poor schoolage children for all counties. For counties with households in the CPS sample, the predictions from the model are then combined by a “shrinkage” procedure with the CPS estimates for those counties.


Developing and applying the Census Bureau's state model to produce estimates of the numbers of poor schoolage children by state. The state estimation process is similar to that for counties, although the state model differs from the county model in several respects.

Adjusting the initial estimates of poor schoolage children from the county model (step 1) for consistency by state with the estimates from the state model (step 2) to produce final estimates of the numbers of related children aged 517 in poverty by county.
In addition, the Census Bureau produces various state and county population estimates, which are used in the estimation of poor schoolage children (see Chapter 8).^{3} Finally, the Census Bureau produces separate estimates of poor schoolage children for Puerto Rico, which is treated as a county and school district equivalent in the Title I allocation formulas (see Appendix E).
^{3 } 
The county population estimates of the total number of schoolage children were used by the Department of Education to calculated estimated proportions of poor schoolage children when it made Title I allocations to counties in the twostage process that was used through the 19981999 school year ( see Chapter 2). 
Step 1: 1995 County Model
County Equation
The 1995 county equation uses as predictor variables county estimates from Internal Revenue Service (IRS) records for 1995, Food Stamp Program records for 1995,^{4} the 1990 census, and the Census Bureau's postcensal population estimates program for 1996. As the dependent or outcome variable, it uses county estimates of the number of poor schoolage children averaged over 3 years of the March CPS (data from the March 1995, 1996, and 1997 CPS, covering income in 1994, 1995, and 1996). The equation takes the following form:
z_{i} = β_{0} + β_{1}w_{1}_{i} + β_{2}w_{2}_{i} + β_{3}w_{3}_{i} + β_{4}w_{4}_{i} + β_{5}w_{5}_{i}+ v_{i} + a_{i} , (1)
where:
z_{i} = log(3year weighted average of number of poor schoolage children in county i based on 3 years of March CPS data),
w_{1}_{i} = log(number of child exemptions reported by families in poverty on tax returns in county i),
w_{2}_{i} = log(number of people receiving food stamps in county i),
w_{3}_{i} = log(estimated population under age 18 in county i),
w_{4}_{i} = log(number of child exemptions on tax returns in county i),
w_{5}_{i} = log(number of poor schoolage children in county i in the previous census),
v_{i} = model error for county i, and
a_{i} = sampling error of the dependent variable for county i.
Dependent Variable The Census Bureau originally decided to model the number of poor schoolage children, instead of the Proportion, because of concern that the county population estimates of schoolage children that would form the basis for converting the estimated proportions to estimated numbers were of uncertain quality. Hence, it would be difficult to construct estimates of the precision of the estimated numbers of poor schoolage children at the county level, which played the most important role in the Title I allocation formula under the twostage procedure.
The Census Bureau decided to estimate the number of poor schoolage children at a particular time and not to estimate the change in the number since the 1990 census because it concluded that the available administrative data were
^{4 } 
The food stamp data for most countries pertain to July 1995; for other countries, they are an annual average of monthly counts for 1995. The country numbers are controlled to state food stamp estimates, which are 12month averages centered on January 1996 (see Chapter 3). 
likely to be measured more consistently across areas at a given time than they would be over time, given changes in tax and transfer programs.
The Census Bureau decided to combine 3 years of CPS data to form the dependent variable for the county model. The combination of years improves the precision of the dependent variable, although the dependent variable consequently pertains to the 3year period rather than to the estimation year.
The weighted 3year average of the number of poor schoolage children in each county is computed as the product of the weighted 3year average estimated CPS poverty rate for related children aged 517 and the weighted 3year average estimated CPS number of related children aged 517 for that county. The weights for these averages are the fractions of the 3year estimated total of CPS interviewed housing units in each county that contain children aged 517 in each year.
Because only a subset of counties have households in the March CPS sample, the relationships between the predictor variables and the dependent variable in the model are estimated solely from this subset of counties. This subset includes proportionately more large counties and proportionately fewer small counties than the distribution of all counties. Also, because the dependent variable is measured on a logarithmic scale for reasons given below and values of 0 cannot be transformed into logarithms, a number of counties whose sampled households contain no poor schoolage children are excluded from the estimation. In all, 985 of 3,142 counties were included in the 1995 model estimation–the remainder were excluded because none of their CPSsampled households had schoolage children who were poor (262 counties), none of their CPSsampled households had schoolage children (27 counties), or they had no CPSsampled households (1,868 counties). Corresponding figures for 1993 are as follows: 1,184 of 3,143 counties were included in the model estimation; 304 counties had CPS households with schoolage children but none with schoolage children who were poor, 41 counties had CPSsampled households but none with schoolage children, and 1,614 counties had no CPSsampled households at all.^{5}
Predictor Variables The choice of predictor variables was governed by data availability and the assumed relationship of the variables to poverty. The number of child exemptions reported by families in poverty on tax returns and the number of food stamp recipients were included as variables that are indicative of poverty and available on a consistent basis (or reasonably consistent basis, in the case of
^{5 } 
The reason why the 1993 model estimation included a higher proportion of counties than the 1995 model estimation is because a redesign of the CPS sample was phased in between April 1994 and July 1995. Some counties were included in the old design but not the new and vice versa; the estimation included all counties that had at least 1 year of CPS data in the 3 years centered on the estimation year. 
food stamps) for all counties in the nation.^{6} The 1990 census estimate of poor schoolage children was used in the 1995 model on the assumption that previous poverty is likely to be indicative of subsequent poverty. The total number of child exemptions on tax returns and the population estimate of the total number of children under 18 were included in order to cover children not reported on tax returns (i.e., in nonfiling families), who are assumed to be poorer on average than other children. (The estimated regression coefficients for the 1995 county model predictor variables are given in Table 62.)
Form of the Variables The dependent variable and all of the predictor variables are measured on a logarithmic scale. A reason to use logarithms is the wide variation in the CPS estimates of the dependent variable and the values of the predictor variables among counties when they are measured on the numeric scale: transforming the variables to logarithms made their distributions more symmetric and the relationships between some of them and the dependent variable more linear.
Estimation of Model and Sampling Error Variance The total squared error of the county estimates (the difference between the model estimates and the direct estimates from the CPS) has two sources: model error (v) and sampling error (a), which are the last two terms in the county equation.^{7} Model error is the difference between the value of the logarithm of the 3year weighted average of the number of poor schoolage children that would have been obtained had all the households in the county been included in the CPS sample and the model estimate of this quantity. Sampling error is the difference between the estimate of this quantity from the CPS sample and the value that would have been obtained had all households in the county been included in the CPS sample. Model error is assumed to be constant across counties (see below). Sampling error is not constant across counties; it is larger for counties that have fewer households included in the CPS sample.
Because a procedure to estimate the sampling error variance directly for the March CPS has not yet been developed (see Chapter 9), the variances of the model error and sampling error terms in the 1995 county equation are estimated in a multiplestep process that involves several assumptions. First, equation (1) is estimated for 1989, using the 1990 census estimate of poor schoolage children as the dependent variable and 1989 IRS and food stamp data, 1990 census popula
^{6 } 
Poverty status for families on tax returns is determined by comparing the adjusted gross income on each return to the average poverty threshold for the total number of exemptions on the form. Although there are differences between the CPS and IRS definitions of income and family composition, they are not critical for purposes of developing a predictive model. 
^{7 } 
As used in statistics, “error” is the inevitable discrepancy between the truth and an estimate due to variability in measurements and the fact that model predictions are imperfect. 
tion data, and 1980 census poverty data as the predictor variables. The estimation procedure includes almost all counties, excluding only those few that had no poor schoolage children in the longform sample in 1980 or 1990 or that did not exist in both 1980 and 1990. A generalized variance function is used to estimate the sampling variances of the census estimates, which are often relatively small because of the large size of the census longform sample. Then, by estimating equation (1) using weighted least squares in an iterative process, in which a starting value is specified for the model error variance and the sampling error variances are known, maximum likelihood estimates of the regression coefficients and the model error variance are obtained. Most of the mean square error in the census equation (about 90%) is derived from model error variance.
It is assumed that the model error variance for the CPS equation for 1995 is the same as that for the 1990 census equation and that it has the same value for each county. Then, the CPS equation is estimated by iteratively weighted least squares, which produces maximum likelihood estimates of the regression coefficients and the total sampling error variance, which is distributed among the counties as an inverse function of their sample size.^{8} Most of the CPS mean square error (about 90%) is derived from sampling error variances.
The resulting estimates of model error variance and sampling error variance are used to determine the weights to give to the model prediction from the maximum likelihood procedure and to the CPS direct estimate in developing estimates of poor schoolage children for counties with sampled households with poor schoolage children in the CPS.
Combining the County Equation and CPS Estimates
By calculating the relationships among the predictor variables and the CPS estimates of schoolage children in poverty for the subset of counties that have households with poor schoolage children in the March CPS sample, it is possible to obtain a good estimate of a regression equation for predicting the number of poor schoolage children in a county, even though the CPS estimates for many small counties have large levels of uncertainty. The regression equation can then be used to predict the number of schoolage children in poverty from the food stamp, IRS, population estimates, and previous census predictor variables for each county, whether or not the county is in the March CPS sample.
For counties that have households with poor schoolage children in the March CPS sample, a weighted average of the model prediction and the estimate based on data from the sampled households (the direct estimate) is used to produce an estimate for that county using empirical Bayes (“shrinkage”) procedures for com
^{8 } 
The weights used are the reciprocal of the sum of the estimated sampling variance of the estimate of the log number of poor schoolage children in a given county plus the estimated model error variance, assumed to be constant across counties; see Appendix A. 
bining estimates (see Fay and Herriot, 1979; Ghosh and Rao, 1994; Platek et al., 1987; Rao, 1999). The weights that are given to the model prediction and the direct estimate depend on their relative precision (see discussion above of how model error variance and sampling error variance are estimated). For a county with very few sample households in the CPS and hence a high level of sampling variability in the direct estimate, most of the weight will be given to the model prediction and little to the direct estimate. For a county with a larger number of sampled households in the CPS, more weight will be given to the direct estimate and less to the model prediction. For almost all counties that have households with poor schoolage children in the CPS, most of the weight is given to the model prediction; for the 1993 estimates the weight for the model prediction was less than 0.5 for only 2 counties; it was less than 0.75 for only 13 counties. For counties that lack households with poor schoolage children in the CPS sample, the prediction from the model is the estimate. After shrinkage, the initial county estimates are obtained by transforming the shrunk values from the logarithmic to the numeric scale.
Step 2: 1995 State Model
State Equation
The state model equation takes the following form:
y_{j} = α_{0} + α_{1}x1_{j} + α_{2}x2_{j} + α_{3}x3_{j} + α_{4}x4_{j} + u_{j} + e_{j} , (2)
where:
y_{j} = estimated proportion of schoolage children in state j who are in poverty based on the March CPS that collects income data pertaining to the estimation year,^{9}
x_{1}_{j} = proportion of child exemptions reported by families in poverty on tax returns in state j,
x_{2}_{j} = proportion of people receiving food stamps in state j,
x_{3}_{j} = proportion of people under age 65 not included on an income tax return in state j,^{10}
^{9 } 
The numerator is the estimated number of poor related children aged 517 from the CPS; the denominator is the estimated total population of children aged 517, whether or not they are related to a family, from the CPS. (See text for the reason to include unrelated children in the denominator; that denominator, however, excludes the institutionalized, who are not sampled.) 
^{10 } 
This percentage is obtained by subtracting the estimated number of exemptions on income tax returns for people under age 65 from the estimated total population under age 65 derived from the Census Bureau 's population estimates program; see Chapter 8. 
x_{4}_{j} = residual for state j from a regression of the proportion of poor schoolage children from the prior decennial census on the other three predictor variables (x_{1}_{j}, x_{2}_{j}, x_{3}_{j}),
u_{j} = model error for state j; and
e_{j} = sampling error of the dependent variable for state j.
Differences from the County Equation
The Census Bureau's state model for estimates of poverty among schoolage children is similar to the county model. However, it differs in a number of respects:
Dependent Variable The state model uses the proportion of schoolage children in poverty in each state as the dependent variable: that is, the dependent variable is a poverty ratio rather than the number of poor schoolage children, as in the county model.^{11} The numerator for the ratio is the CPS estimate of poor schoolage children in a state (i.e., the estimate of the number of poor related children aged 517); the denominator is the CPS estimate of the total number of children aged 517 in the state. A different denominator –total CPS schoolage children, rather than the slightly smaller universe of related schoolage children– is used for consistency with the population estimates that are available to convert the estimated poverty ratios to estimated numbers of poor schoolage children.
In addition, the dependent variable in the state model is derived from 1 year of CPS data (the March 1996 CPS for the 1995 model), rather than a 3year average as in the county model. This decision was made because the sample sizes for states are all reasonably large for the purpose of fitting the regression model.
Predictor Variables As can be seen above, the state model uses a somewhat different set of predictor variables than the county model. (The estimated regression coefficients for the state model predictor variables are given in Table 67.) The state model includes a predictor variable that is the residual from a regression of the proportion of poor schoolage children from the prior decennial census on the other three predictor variables. During the development of the state model, the Census Bureau determined that there was a correlation between the residuals from estimating the model for 1979 with 1980 census data and the residuals from estimating the model for 1989 with 1990 census data. In other words, states that had more poverty than predicted by the crosssectional model for 1979 also tended to have more poverty than predicted by the crosssectional model for
^{11 } 
The dependent variable is termed a ratio because the denominator is not exactly the same as that for the official published poverty rates. 
1989. This result was used to improve the model predictions by including the residual from a regression for the prior census as one of the predictor variables.
Form of the Variables The variables in the state model are proportions rather than numbers and are not transformed to a logarithmic scale as is done in the county model.^{12} A logbased model was examined, but the Census Bureau decided not to transform the variables because, unlike the situation with the county model, the statelevel distributions of the estimated proportions for the predictor variables are reasonably symmetric, and the relationships of the statelevel estimated proportions with the dependent variable are approximately linear.
Combining the State Equation and CPS Estimates
All states have sampled households in the CPS; however, the variability associated with estimates from the CPS is large for some states. As is done for the initial county estimates, the predictions from the state model and the CPS estimates are weighted according to their relative precision to produce estimates of the proportion of poor schoolage children in each state. To produce estimates of the number of poor schoolage children in each state, the estimates of the proportion poor are multiplied by estimates of the total number of noninstitutionalized schoolage children from the Census Bureau's population estimates program. (The estimates of noninstitutionalized schoolage children, which include some adjustments for residents of military group quarters and college dormitories, are the closest approximation available to the CPS estimates of schoolage children.) Finally, the state estimates of the number of poor schoolage children are adjusted to sum to the CPS national estimate of related schoolage children in poverty. This adjustment is a minor one; for 1995 it changed the state estimates by less than 0.5 percent; for 1993, the adjustment changed the state estimates by less than 1 percent.
Step 3: Combining the County and State Estimates
The final step in developing estimates of numbers of poor schoolage children by county is to adjust the initial estimates from the county model (after shrinkage and transformation to the numeric scale with a correction for transformation bias,^{13} step 1) for consistency with the estimates from the state model
^{12 } 
The estimates that are transformed into logarithms in the county model are numbers, not proportions. However, evaluation determined that, if the county model were to estimate proportions, a logarithmic transformation of the dependent and predictor variables would be helpful in that case as well (see Chapter 5). 
(after shrinkage, step 2). The estimate for each state from the state model is then divided by the sum of the estimates for each county in that state to form a state raking factor. Each of the county estimates in a state is multiplied by the state raking factor so that the sum of the adjusted county estimates equals the state estimate. For the final county estimates of poor schoolage children in 1995, the average state raking factor was 0.97; twothirds of the factors were between 0.88 and 1.06. For the final, revised county estimates of poor schoolage children in 1993, the average state raking factor was 1.065; twothirds of the factors were between 0.975 and 1.154.
DIFFERENCES BETWEEN 1995 AND 1993 ESTIMATION PROCEDURES
The procedure summarized above to produce the 1995 county estimates that the Census Bureau released in early 1999 differs in a few respects from the procedure that was used to produce the revised 1993 estimates that the Bureau released in early 1998. The changes involved the input data for the state and county models:

An error in processing the 1989 IRS data was discovered and corrected. The corrected data were used to reestimate the decennial census equation that provides the residual predictor variable in the 1995 state model (x_{4}_{j} in equation (2)). The corrected data were also used to reestimate the 1989 state and county models for evaluation purposes (see Chapter 6).

Several changes were made to the food stamp data for input to the state model: instead of using data for July of the estimation year, the number of food stamp recipients was changed to a 12month average centered on January 1 of the following year; counts by state of the numbers of people who received food stamps due to specific natural disasters were obtained from the Department of Agriculture and subtracted from the counts of the total number of recipients; timeseries analysis of monthly state food stamp data from October 1979 through September 1997 was used to smooth outliers; and food stamp recipient data for Alaska and Hawaii were adjusted downward to reflect the higher eligibility thresholds for those states.
^{13 }
Transformation bias occurs when a regression model estimates an expected value for the dependent variable that is on a different scale from that for which estimates are needed. In this instance, the county model predicts poor schoolage children on the log scale; when the predictions on the log scale are exponentiated back to the original numeric scale, the result is the exponential of the expected value of the dependent variable on the original scale. This difference is referred to as transformation bias, for which a correction is made.

The food stamp numbers for the county model were raked to the adjusted state food stamp numbers.

In both the state and county models, child exemptions reported by families on tax returns were redefined to include children away from home in addition to children at home. This change may increase the number of IRS poor child exemptions in households with children away from home both because of the additional children and because poverty thresholds are higher for larger size families.
DIFFERENCES BETWEEN ORIGINAL AND REVISED 1993 ESTIMATION PROCEDURES
The procedure used to produce the revised 1993 county estimates that the Census Bureau released in early 1998 differed in some respects from the procedure used to produce the original 1993 estimates that the Bureau released in early 1997. The principal difference involved a change in one of the predictor variables in the county regression model.
The changes listed below in producing the revised 1993 estimates were retained in producing the 1995 estimates. Specifically:

The revised county model includes the population under 18 as a predictor variable; the original county model included the population under 21 as a predictor variable. The purpose of this variable (whether for the population under 18 or under 21) is to estimate–in conjunction with the variable measuring total child exemptions on IRS tax returns –the number of children in families that did not file a tax return. Evaluation determined that the original estimation procedure was not working well for counties with large numbers of people under age 21 in group quarters, primarily college students and military personnel. Specifically, the model was overpredicting the number of poor schoolage children for those counties. Limiting the predictor variable to the population under 18 reduced the bias in the model predictions for counties classified by percent group quarters residents and improved the model predictions in other respects (see Chapter 6).

Examination of the pattern of residuals (differences between the model predictions and the direct estimates) for counties with sampled households in the March CPS indicated that the original method for estimating model error variance and sampling error variance (described above) was not working as well as it should. The variability of the standardized residuals increased with the number of CPS sample cases rather than remaining constant, and this pattern was common to a variety of alternative models that were examined. The revised 1993 county model includes a slight revision to the procedure for estimating the sampling error variance, which moderated but did not eliminate the anomalous pattern. Further work will be required to further reduce the problem (see Chapter 9).
However, improving the estimation of the model error and sampling error

variances will probably have only a limited effect on the county estimates. The main use of these variance estimates is to determine the weights to be given to the model predictions and to the CPS direct estimates in forming estimates for counties that have sampled households with poor schoolage children in the CPS. Since the model predictions are the dominant component of the county estimates in most cases, changing the weights will not have a substantial impact.

The original model was estimated using a methodofmoments procedure; for the revised model, it was decided that maximum likelihood estimation would be used. This change had only a small effect on the estimated regression coefficients for the predictor variables. The main effect of the change was to increase the estimated sampling error variance. Hence, in comparison with the original 1993 estimates, the revised 1993 model predictions are given somewhat more weight and the CPS direct estimates are given somewhat less weight when weighted estimates are formed for counties that have sampled households with poor schoolage children in the CPS. However, this difference had relatively little effect on the county estimates.