Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 11
County Estimates
Reliance on the most recent decennial census to allocate federal funds to
counties and other small areas has primarily reflected the absence of alternative
data sources with comparable or superior reliability. Mindful of the need for
small-area estimates that are more up to date than census estimates, the Census
Bureau organized the Small Area Income and Poverty Estimates (SAIPE) Pro-
gram to develop methods for producing postcensal income and poverty estimates
for states and counties by using multiple data sources and innovative statistical
methods. The program began in late 1992 with financial support from a consor-
tium of five federal agencies. Congress made this work more urgent by passing
legislation in 1994 that charged the Census Bureau to produce updated estimates
of poor school-age children for counties and school districts every 2 years, to
begin in 1996 with estimates for counties, discussed in this chapter, and in 1998
with estimates for school districts, discussed in Chapter 3.
The SAIPE Program faces a challenging task to produce county-level esti-
mates. For Title I allocations, there is no single administrative or survey data
source that provides sufficient information with which to develop reliable direct
estimates of the number and proportion of school-age children in families in
poverty by county. The March Income Supplement to the Current Population
Survey (CPS) can provide reasonably reliable annual direct estimates of such
population characteristics as the number and proportion of poor children at the
national level and possibly for the largest states. However, the CPS cannot
provide direct estimates for the majority of counties because the sample does not
include any households in them. And for almost all of the counties with house-
holds in the CPS sample (about 1,250 of a total of 3,143 counties in 1995), the
1
7
OCR for page 12
2
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
estimates have a high degree of sampling variability. ~ Nonetheless, the CPS data
may serve as the basis for creating usable estimates for counties through the
application of statistical estimation techniques to develop "model-based" or "in-
direct" estimates.
Model-based or indirect estimators use data from several areas, time periods,
or data sources (which could include the previous census) to "borrow strength"
and improve the precision of estimates for small areas. A model-based approach
is needed when there is no single data source for the area and time period in
question that can provide direct estimates that are sufficiently reliable for the
intended purpose. The Census Bureau has used this strategy to develop estimates
of median family income for states (Fay et al., 1993) and, in part, to develop
population estimates for states and counties (see Spencer and Lee, 1980~.
This chapter provides a summary description and evaluation of the model-
based approach used by the Census Bureau to develop estimates by county of the
number and proportion of school-age children in families in 1996 who were poor
in 1995 (referred to as the 1995 county estimates). A document prepared by the
Census Bureau describes the estimation procedure and evaluations of the 1995
estimates in detail (Bureau of the Census, 1998; see also National Research
Council, 1998:Chs. 3, 4, Apps. C, D on the evaluations of the 1993 estimates).
If the Department of Education uses the Census Bureau's 1995 school dis-
trict estimates of poor school-age children for direct allocation of Title I funds to
districts, the 1995 county estimates will not be used directly. However, the 1995
county estimates are critical to the development of 1995 school district estimates.
As a result of the lack of data at the school-district level, the Census Bureau has
been constrained to use for school districts a very simple model-based method
referred to as synthetic estimation, which applies the shares of poor school-age
children for the school districts in a county according to the 1990 census to the
updated 1995 county estimates to obtain updated school district estimates (see
Chapter 3~.2 Therefore, in order to evaluate the 1995 school district estimates, it
is essential to understand and evaluate the 1995 county estimates.
{For a description of the March CPS and differences between income and poverty data from the
CPS and the 1990 census long-form sample, see National Research Council (1997:Ch. 2; App. B).
The 1990 census sample includes households in all counties and covers 15 million households, 300
times more than the 50,000 households in the CPS, yet even the 1990 census estimates are relatively
variable for some small counties (National Research Council, 1997:Table 2-1).
2We use the term "synthetic estimation" for the Census Bureau's shares procedure for school
district estimates and distinguish it from the statistical regression modeling that was done for the
state and county estimates. However, synthetic estimation is sometimes used more broadly in the
small-area literature.
OCR for page 13
COUNTY ESTIMATES
13
ESTIMATION PROCEDURE
The Census Bureau's estimation procedure for counties uses two regression
models that predict poor school-age children a county model and a separate
state model. The estimation procedure was first used to develop the 1993 county
estimates. It includes the following steps summarized below: (1) a regression
model is developed to provide initial estimates of the number of poor school-age
children at the county level; (2) a state model is developed to produce estimates
of the number of poor school-age children by state; and (3) the initial county-
level estimates are adjusted so that the final estimates for counties within each
state sum to the state-level estimates. In addition, the Census Bureau produces
county population estimates of the total number of school-age children, which the
Department of Education has used to calculate estimated proportions of poor
school-age children for counties. Finally, the Census Bureau produces separate
estimates of poor school-age children for Puerto Rico.
Step 1: County Model
The first step in the estimation process is to develop and apply the Census
Bureau' s county model to produce initial estimates of the numbers of poor school-
age children. This step involves:
obtaining data from the March CPS for three consecutive years to con
struct a dependent variable in a county model regression equation that is the
estimated log number of poor school-age children for counties with households in
the CPS sample;
obtaining data from administrative records and other sources that are avail
able for all counties to construct predictor variables for the regression equation;
specifying and estimating the regression equation to relate the predictor
variables to the dependent variable; and
using the estimated regression coefficients from the equation and the pre
dictor variables to develop estimates of poor school-age children for all counties.
For counties with households in the CPS sample, the predictions from the model
are then combined by a "shrinkage" procedure with the CPS direct estimates (on
a logarithmic scale) for those counties. (The shrinkage procedure weights the
two sets of estimates according to their relative precision; see Fay and Herriot
[1979i, Ghosh and Rao [1994i, and Platek et al. [19871 on shrinkage methods.)
The initial county estimates are then obtained by transforming the predictions
from the logarithmic to the numeric scale.
The county model equation takes the following form:
Hi = 0c + Mali + 02X2i + 03X3i + 04X4i + 05X5i+ Ui + ei
(1)
OCR for page 14
14
where:
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
log(3-year weighted average of poor school-age children in county i),
x~i = log~number of child exemptions reported by families in poverty on tax
returns in county i),
log~number of people receiving food stamps in county i),
log~estimated population under age 18 in county i),
log~number of child exemptions on tax returns in county i),
log~number of poor school-age children in county i in the previous
x2i
X3i
X4i
X5i =
census),
ui = model error for county i, and
ei = sampling error of the dependent variable for county i.
The predictor variables in the county equation for the 1995 estimates are
based on data from Internal Revenue Service (IRS) records for 1995 (oh, x4i),
Food Stamp Program records for 1995 (x2i), the Census Bureau's population
estimates program for 1996 (x3i), and the 1990 census (x5i).3 As the dependent or
outcome variable, the county equation uses county estimates of the number of
poor school-age children averaged over 3 years of the March CPS (data from the
March 1995, 1996, and 1997 CPS, covering income in 1994, 1995, and 1996~.4
The relationships between the predictor variables and the dependent variable
in equation (1) are estimated solely from the subset of counties that have house-
holds in the March CPS sample. This subset includes proportionately more large
counties and proportionately fewer small counties than the distribution of all
counties. Because values of zero cannot be transformed into logarithms, a num-
ber of counties whose sampled households contain no poor school-age children
are excluded from the estimation. In all, 985 of the country's 3,143 counties were
included in the 1995 model estimation.
Step 2: State Model
The second step in the estimation process is to develop and apply the Census
Bureau's state model to produce estimates of the number of poor school-age
children by state. The state estimation is similar to that for counties, although the
state model differs from the county model in several respects.5
3Variables x3i and x4i are included in the model in order to cover children not reported on tax
returns (i.e., in nonfiling families), who are assumed to be poorer on average than other children.
4see Bureau of the census (1998) and National Research Council (1998:Ch. 2) for the derivation
of the 3-year weighted average of poor school-age children from the cPs and of the last two terms in
the equation (Hi and ei).
5see National Research Council (1998:Ch. 2) for a detailed review of the forms of the state and
county models and the differences between them.
OCR for page 15
COUNTY ESTIMATES
The state model equation takes the following form:
Hi = 0c + plXli + 02X2i + D3X3i + D4X4i + hi + e
where:
15
(2)
Hi = proportion of school-age children in state i that are poor, estimated
from one year of the CPS (March 1996 CPS for the 1995 model),
x~i = proportion of child exemptions reported by families in poverty on tax
returns in state i,
x2i = proportion of people receiving food stamps in state i,
x3i = proportion of people under age 65 who did not file an income tax
return in state i,6
x4i = residual for state i from a regression of the proportion of poor school-
age children from the most recent decennial census on the other three predictor
variables,7
ui = model error for state i, and
ei = sampling error of the dependent variable for state
All states have sampled households with poor school-age children in the
CPS; however, the variability associated with estimates from the CPS is large for
some states. As is done for the initial county estimates, the predictions from the
state model and the CPS direct estimates are combined in a shrinkage procedure
to produce estimates of the proportion of poor school-age children in each state.
To produce estimates of the number of poor school-age children in each state, the
estimates of the proportion poor are multiplied by estimates of the total number
of noninstitutionalized school-age children from the Census Bureau's program of
population estimates. Finally, the state estimates of the numbers of poor school-
age children are adjusted to sum to the CPS national estimate of related school-
age children in poverty. This adjustment is a minor one; for 1995 it changed the
state estimates by less than one-half of 1 percent.
Step 3: Combining the County and State Estimates
The last step in the estimation process is to adjust the initial estimates of poor
school-age children from the county model (step 1) for consistency by state with
the estimates from the state model (step 2) to produce final estimates of the
6This percentage is obtained by subtracting the estimated number of exemptions on income tax
returns for people under age 65 from the estimated total population under age 65 that is derived from
demographic analysis (see National Research Council, 1998:App. B).
7For the 1995 state model, x4i is the residual from a regression of poor school-age children from
the 1990 census on the other three predictor variables for 1989.
OCR for page 16
6
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
numbers of related children aged 5-17 in poverty by county. The estimate for
each state from the state model is divided by the sum of the estimates for each
county in that state to form a state raking factor. Each of the county estimates in
a state is multiplied by the state raking factor so that the sum of the adjusted
county estimates equals the state estimate. For the final county estimates of poor
school-age children in 1995, the average state raking factor was 0.97; two-thirds
of the factors were between 0.88 and 1.06.
Differences Between 1995 and 1993 Estimation Procedures
The procedure summarized above to produce the 1995 county estimates
differs in a few respects from the procedure that was used to produce the revised
1993 estimates described in the panel's second interim report (National Research
Council, 1998~. The changes involved the input data for the state and county
models:
.
An error in processing the 1989 IRS data was discovered and corrected.
The corrected data were used to reestimate the decennial census equation that
provides the residual predictor variable in the 1995 state model (x4 i in equation
(2~. The corrected data were also used to reestimate the 1989 state and county
models for evaluation purposes.
· Several changes were made to the food stamp data for input to the state
model: instead of using data for July, the number of food stamp recipients was
changed to a 12-month average centered on January 1 of the following year;
counts by state of the numbers of people who received food stamps due to
specific natural disasters were obtained from the Department of Agriculture and
subtracted from the counts of the total number of recipients; time-series analysis
of monthly state food stamp data from October 1979 through September 1997
was used to smooth outliers; and food stamp recipient data for Alaska and Hawaii
were adjusted downward to reflect the higher eligibility thresholds for those
states.
· The food stamp numbers for the county model were raked to the adjusted
state food stamp numbers.
· In both the state and county models, child exemptions reported by fami-
lies on tax returns were redefined to include children away from home in addition
to children at home. This change may increase the number of IRS poor child
exemptions in households with children away from home both because of the
additional children and because poverty thresholds are higher for larger size
families.
Population Estimates
To accompany county estimates of school-age children in 1996 who were in
poor families in 1995, the Census Bureau produced county-level estimates of the
OCR for page 17
COUNTY ESTIMATES
17
total number of children aged 5-17 for 1996 from its demographic population
estimates program. The estimates from step 3 above and the population estimates
can then be used to calculate estimated proportions of poor school-age children
for counties. The Census Bureau also produced county-level estimates of total
population for 1996. The population estimates pertain to July of the year follow-
ing the one for which poverty status is estimated. A detailed description and
evaluation of the Census Bureau's population estimates procedures for counties
is provided in National Research Council (1998:App. B).
Puerto Rico
Estimates of poor school-age children for Puerto Rico, which is treated as a
county equivalent in the allocation formula, are developed separately. The county
model cannot be used for them because there are no precise equivalents for
Puerto Rico of tax return and food stamp data to form predictor variables for the
model.
The original estimates for Puerto Rico of school-age children in 1994 who
were poor in 1993 were developed with data from an experimental March 1995
income survey modeled after the CPS March Income Supplement, together with
data from the decennial census and updated population estimates. These data
sources required a number of adjustments for several reasons: (1) the March
1995 experimental survey did not collect information on the ages of family
members under 16 (so that related children aged 5-17 could not be identified
among those aged under 18~; (2) the updated Puerto Rico population estimates
were for all children in the resident population, not for related children only; and
(3) the survey, which was conducted in 1995, obtained information on 1994, not
1993, income. In making the adjustments, the Census Bureau assumed that
certain relationships observed in 1990 census data still applied and that the change
in the number of Puerto Rico school-age children in poverty between 1989 and
1994 was linear.
The sample size of the experimental survey of about 3,200 households ap-
peared large enough to provide a direct estimate of the number of poor school-age
children with adequate precision. However, only limited information was avail-
able about other key aspects of data quality, including household response rates
on the income questions and the editing or imputation procedures used. Hence, it
was difficult to evaluate the quality of the 1993 estimates for Puerto Rico, al-
though the estimation procedures seemed appropriate given the data available.
The Puerto Rican Family Income Survey is now an ongoing survey, con-
ducted at 2-year intervals. The Census Bureau used income data from the 1996
survey, in which about 2,300 households were interviewed in February-March
1997, together with decennial census data and updated population estimates for
Puerto Rico, to construct estimates of school-age children in 1996 who were poor
in 1995. The three adjustments that were made for the 1993 estimates were also
OCR for page 18
8
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
required. The change in the number of children in poor families between 1994
and 1996 was assumed to be linear. Additional information was obtained from
Puerto Rico about the quality of the income survey, which in general, supported
the use of the survey data to develop estimates of poor school-age children in the
commonwealth (see Santos and Waddington, 1999~.
EVALUATION
The development of model-based estimates for small areas is a major, con-
tinuing research and development effort for which extensive evaluation is re-
quired. For updated estimates of poor school-age children for counties, a thor-
ough assessment of all aspects of the estimation procedure is necessary to have
confidence in the estimates whether the estimates are used by the Department
of Education to allocate Title I funds to counties (as has been the practice up to
now) or to develop estimates for school districts.
Since there are no absolute criteria for what are acceptable evaluation results,
one method for determining if the performance of a model can be improved is to
examine alternative models. Such comparisons may indicate changes that would
be helpful for a model; they may also suggest that an alternative model is prefer-
able. As summarized above, the Census Bureau's county estimates of poor
school-age children are produced by using a county regression model, a state
regression model, and county population estimates developed with demographic
analysis techniques. A comprehensive evaluation for each of these components
of the estimation procedure should include "internal" and "external" evaluations.
An internal evaluation is primarily an investigation of the validity of the
underlying assumptions and features of a model. For a regression model, an
internal validation is typically based on an examination of the residuals from the
regression the differences between the predicted and reported values of the
dependent variable for each observation. In an external evaluation, the estimates
from a model are compared with target or "true" values that were not used to
develop the model. Ideally, an internal evaluation of regression model output
should precede external evaluation. Changes made to the model to address
concerns raised by the internal evaluation would likely improve its performance
in the external evaluation. Both internal and external evaluations should be
carried out for alternative models.
In its second interim report, the panel reviewed a series of internal and
external evaluations that were conducted for the revised 1993 county estimates of
poor school-age children (National Research Council, 1998:Ch. 4, Apps. B. C,
D). The state model and the county population estimates were examined as well,
both directly and as they contributed to the county estimates of poor school-age
children. The evaluation determined that the revised procedure for developing
updated county estimates, which principally involved a change in one of the
OCR for page 19
COUNTY ESTIMATES
19
predictor variables in the original county model,8 produced estimates for 1993
that were appropriate for use in allocating Title I funds to counties.
Because the 1995 county estimates were developed by using a procedure
similar to that used to develop the revised 1993 county estimates, the focus of the
evaluation effort for the 1995 estimates shifted to how the state and county
models behave over several time periods, and specifically, to determining whether
there are persistent biases or other problems. The evaluations of the 1995 county
estimates, which are described in this chapter, included:
(1) internal evaluation of the regression output for the 1995 county model
estimated for 1995, 1993, and 1989 (using uncorrected and corrected tax return
data);
(2) comparison of estimates of poor school-age children that were developed
from the 1995 form of the county model for 1995, 1993, and 1989 with CPS
estimates for groups of counties, a form of external evaluation; and
(3) evaluation of the state model, including examination of regression output
for 1996, 1995, 1993, 1992, 1991, 1990, and 1989 and consideration of the state
raking factors by which county model estimates are adjusted to make them con-
sistent with the state model estimates.
County Model Internal Evaluations
The first test of a regression model is that it perform well when evaluated
internally, that is, for the set of observations for which it is estimated. The
evaluation of the county regression output pertains to the regression model itself,
that is, before the predictions are combined with the direct CPS estimates in a
shrinkage procedure or raked to the estimates from the state model. The regres-
sion output comprises the model predictions for counties that have at least one
household with poor school-age children in the CPS sample. We first summarize
the evaluation work done on the 1993 county model predictions and then detail
the work on the 1995 county model predictions.
1993 Evaluation
As part of the evaluation of the revised 1993 county estimates (National
Research Council, 1998:Ch. 4 and App. C), the panel and the Census Bureau
examined the underlying assumptions of 13 alternative county models through
8The predictor variable x3i in equation (1) was changed from the estimated population under age
21 to the estimated population under age 18. This change improved the model predictions, particu-
larly for groups of counties classified by the percentage of group quarters residents (see National
Research Council, 1998:Ch.2).
OCR for page 20
20
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
evaluation of the regression model output for 1989 and 1993. The models varied
on three dimensions: treatment of information from the previous census (bivari-
ate or single-equation), form of the variables (poverty rates or numbers, trans-
formed to logarithms or not transformed), and whether the model included fixed
state effects. Although an evaluation of the regression output would not likely
provide conclusive evidence with which to rank the performance of alternative
models, particularly when they use different transformations of the dependent
variable, such an examination could help determine which models perform rea-
sonably well.
The assumptions examined included:
.
linearity of the relationships between the dependent variable and the pre-
dictor variables, assessed by examining a variety of graphical plots;
· constancy of the assumed linear relationship over different time periods,
assessed through comparison of the regression coefficients on the predictor vari-
ables for the years for which the model was estimated;
· whether any of the included predictor variables are not needed in the
model, evaluated by looking for insignificant t-statistics for the estimated values
of individual regression coefficients, and, conversely, whether other potential
predictor variables are needed in the model, evaluated by looking for nonrandom
patterns, indicative of possible model bias, in the distributions of standardized
residuals displayed for categories of counties;9
· normality (primarily symmetry and moderate tail length) of the distribu-
tion of the standardized residuals;
· whether the standardized residuals have homogeneous variances, that is,
whether the variability of the standardized residuals is constant across counties
and does not depend on the values of the predictor variables; and
· absence of outliers.
The analysis for the most part supported the assumptions for the 13 models
that were examined; it did not strongly support one model over another. A few
problems characterized all or most of the models. First, most models tended to
9The standardization of the residuals involved estimating the predicted standard errors of the
residuals, given the predictor variables, and dividing the observed residuals by the predicted standard
errors. The predicted standard error of the residual for a county is a function of the estimated model
error variance and the estimated sampling error variance (see Belsley et al., 1980).
The categories of counties were specified in terms of: census region, census geographic division,
metropolitan status of county, population size in 1990, population growth from 1980 to 1990, per-
centage of poor school-age children in 1980, percentage of Hispanic population in 1990, percentage
of black population in 1990, persistent poverty from 1960 to 1990 for rural counties, economic type
for rural counties, percentage of group quarters residents in 1990, and number of households in the
CPS sample.
OCR for page 21
COUNTY ESTIMATES
21
overpredict the number of poor school-age children in larger urban counties,
especially those with large percentages of Hispanics. Second, all models showed
evidence of some variance heterogeneity, particularly with respect to CPS sample
size and often with respect to the predicted value (number or proportion of poor
school-age children). Some of the models exhibited more problems with skew-
ness and outliers than others. Finally, according to the internal evaluation, none
of the other models was clearly superior to the revised Census Bureau 1993
county model.
1995 Evaluation
The internal evaluation for the 1995 county model focused on comparisons
of the properties of the model when estimated for different time periods. The
analysis looked in particular at three characteristics: the constancy of the regres-
sion coefficients on the predictor variables over time; distributions (box plots) of
the standardized residuals for categories of counties to determine if there were
any nonrandom patterns that persisted over time; and the phenomenon observed
in the 1993 evaluations by which the variance of the standardized residuals was
related to CPS sample size and the predicted value of the dependent variable
(variance heterogeneity).
Constancy of the Regression Coefficients Because the county model is
refitted for each prediction year, constancy of the regression coefficients for the
predictor variables over time is not as important as it would be if the estimated
regression coefficients from the model were used for predictions for subsequent
years. Also, major changes in economic conditions would be expected to cause
some changes in the coefficients. Nonetheless, it is desirable for the coefficients
to be in the same direction and not fluctuate wildly in size over time.
Table 2-1 shows the regression coefficients for the predictor variables for the
1995 county model estimated for 1995 and 1993 and for 1989 with corrected IRS
data and with original (uncorrected) IRS data.l° The coefficients for the three
"poverty level" predictor variables child exemptions reported by families in
poverty on tax returns (column 1), food stamp recipients (column 2), and poor
school-age children from the previous census (column 5) are fairly similar in
the equations for all three time periods. There are more substantial differences
across the three time periods in the size of the estimated coefficients for the other
two variables-population under age 18 (column 3) and total number of child
exemptions on tax returns (column 4~. However, the sum of these two coeffi
1OThe regressions for 1995 and for 1989 with corrected IRS data also used modified food stamp
data (i.e., the county food stamp data were raked to the adjusted state food stamp data, as described
above).
OCR for page 28
28
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
As a type of external validation by which the issue of persistent bias could be
examined, the panel and the Census Bureau compared estimates of poor school-
age children from the 1995 county model for categories of counties for 1989,
1993, and 1995 with CPS direct estimates for those categories for the three
periods. Three years of CPS data were used to form the weighted estimates in
each case in order to reduce the sampling variability.~5
Table 2-2 shows the difference in the number of poor school-age children
from the county model, estimated for 1989 (using corrected IRS data), 1993, and
1995, and the weighted 3-year CPS direct estimates centered on those years for
categories of counties. The measure shown is the algebraic difference by cat-
egory, which is the sum for all counties in a category of the algebraic (signed)
difference between the model estimate of poor school-age children and the
weighted CPS direct estimate, divided by the sum of the weighted CPS direct
estimates for the category.
Comparisons with weighted CPS direct estimates have the advantage over
comparisons with the census that they can be performed for multiple years. They
have the disadvantage that the sample sizes for CPS estimates, even aggregated
for 3 years, are small for many categories of counties, thus making the compari-
sons much more uncertain than the 1990 census comparisons because of the
much greater variability in the standard of comparison. Also, in analyzing the
CPS comparisons, one must keep in mind that the model estimates are raked to
the state estimates, which are developed from a single year of the CPS.
The model-CPS aggregate differences in Table 2-2 differ widely among
categories of counties, in large part because of the small sample sizes for the CPS
estimates, even when aggregated for 3 years. Some of the differences are very
large, larger than any of the differences seen in the model-l990 census compari-
sons (see National Research Council, 1998:Table 4-3, column b). Generally, the
larger model-CPS aggregate differences are for categories of counties with smaller
numbers of CPS sample households. For example, the model-CPS aggregate
differences often exceed 5 percent for counties grouped into the nine geographic
divisions, but they are all less than 5 percent for counties grouped into the four
geographic regions.
In addition, the model-CPS aggregate differences for 1989 frequently differ
from the model-l990 census differences. This finding is expected, given that the
measurement of poverty differs between the census and the CPS because of the
many differences in data collection procedures.
i5This analysis is not the same as the analysis of regression output described above, in which the
standardized residuals from the model for counties with sampled households in the CPS-represent-
ing the standardized differences between the model estimates and the direct estimates on the log
scale-were examined for categories of counties.
i6For future evaluations of this type, the census Bureau should develop estimates of the standard
errors of the differences so that significant differences between the model estimates and the cPs 3-
year aggregate estimates can be identified.
OCR for page 29
COUNTY ESTIMATES
TABLE 2-2 Comparison of County Model Estimates with CPS Aggregate
Estimates of the Number of Poor School-Age Children, 1995, 1993, and 1989:
Algebraic Difference by Category of County (in percent)
29
Model-Model- Model-Sample
No. ofCPS,CPS, CPS,Size, CPS
Countiesal 995bl 993b l 989bl 996c
Category(1)(2)(3) (4)(5)
Census Regiond
Northeast217-2.870.81 -4.3610,708
Midwest1,055-0.490.61 -4.3111,393
South1,4254.05-0.13 4.4815,440
West444-4.16-0.95 -0.4312,141
Census Divisiond
New England67-13.511.87 27.073,696
Middle Atlantic1500.050.54 -9.797,012
East North Central437-6.10-0.64 -3.046,841
West North Central61818.314.25 -7.444,552
South Atlantic5911.820.83 4.128,150
East South Central364-5.53-5.85 9.322,529
West South Central47012.001.90 2.444,761
Mountain281-3.9119.87 0.845,543
Pacific163-4.24-6.48 -0.926,598
Metropolitan Status
Central county of
metropolitan area493-2.75-0.91 -3.5334,343
Other metropolitan25453.75-3.64 8.442,801
Nonmetropolitan2,3941.243.50 8.3212,538
1990 Population Size
Under 7,500525-17.2157.03 0.74933
7,500-14,99963019.82-23.67 -0.191,550
15,000-24,9995242.946.24 17.022,289
25,000-49,99962030.46-0.23 -4.464,204
50,000-99,999384-2.524.99 22.475,979
100,000-249,99925917.2712.12 -3.888,263
250,000 or more199-7.24-2.49 -3.1026,464
1980 to 1990
Population Growth
Decrease of more
than 10.0%444-2.71-22.03 -4.292,170
Decrease of 0.1-10.0%972-4.312.44 -1.3210,655
0.0-4.9%5476.043.41 3.188,015
5.0-14.9%6201.125.97 4.6111,590
15.0-24.9%260-0.07-4.11 -10.449,305
25.0% or more292-0.52-2.27 10.317,947
continued
OCR for page 30
30
TABLE 2-2 Continued
SMALL-AREA ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
Model- Model- Model-Sample
No. ofCPS, CPS, CPS,Size, CPS
Countiesal 995b l 993b l 989bl 996c
Category(1)(2) (3) (4)(5)
Percentage of Poor
School-Age Children, 1980
Less than 9.4%5162.74 7.22 -1.0714,980
9.4-11.6%5241.39 5.28 4.3512,291
11.7-14.1%530-10.01 -6.49 -6.729,837
14.2-17.2%5231.28 -5.82 0.445,217
17.3-22.3%5199.32 17.41 0.234,623
22.4-53.0%5231.05 -14.81 4.112,734
Percentage Hispanic, 1990
0.0-0.9%1,7701.26 -0.75 3.1312,848
1.0-4.9%8479.33 1.45 4.3216,966
5.0-9.9%193-2.81 17.24 6.386,999
10.0-24.9%181-4.02 -5.14 -8.297,236
25.0-98.0%150-7.90 -3.29 -5.265,633
Percentage Black, 1990
0.0-0.9%1,4468.32 8.02 5.0910,929
1.0-4.9%6157.41 1.04 -1.8310,630
5.0-9.9%2945.41 -2.07 0.958,646
10.0-24.9%381-4.89 -0.75 3.5113,437
25.0-87.0%405-6.85 -2.82 -6.306,040
Persistent Rural
Poverty, 1960- l 99oe
Rural, not poor1,740-2.62 1.53 5.479,734
Rural, poor53522.45 -0.15 14.811,698
Not classified866-1.28 -0.28 -2.6838,250
Economic Type,
Rural Countiese
Farming556-24.56 -29.31 -12.411,634
Mining14646.97 27.59 40.67901
Manufacturing506-7.10 -3.58 -1.512,369
Government243120.13 27.59 59.391,661
Services323-12.18 -12.42 -11.862,760
Nonspecialized4846.99 18.35 23.892,018
Not classified883-1.18 -0.20 -2.5938,339
Percentage of Group
Quarters Residents,
1990
Less than 1.0%5453.32 22.03 16.603,494
1.0-4.9%2,187-1.58 -1.27 -1.8441,648
5.0-9.9%29911.90 -1.22 4.513,980
10.0-41.0%11049.44 -6.28 17.02560
OCR for page 31
COUNTY ESTIMATES
TABLE 2-2 Continued
31
Model- Model- Model- Sample
No. of CPS, CPS, CPS, Size, CPS
Countiesa l 995b l 993b l 989b l 996c
Category (1) (2) (3) (4) (5)
Change in Poverty
Rate for School-Age
Children, 1980- 1990
Decrease of more
than 3.0% 536 -3.88 -11.16 -10.04 4,038
Decrease of 0.1-3.0% 649 -4.57 2.63 4.44 12,658
0.0-0.9% 272 2.16 -2.75 9.66 5,102
1.0-3.4% 621 -1.07 0.11 -5.06 14,660
3.5-6.4% 532 9.09 -2.60 -0.66 7,507
6.5-38.0% 523 -1.07 5.17 3.98 5,719
a3,141 counties are assigned to a category for most characteristics; 3,135 counties are assigned to
a category for 1980-1990 population growth and 1980 percentage of poor school-age children; 3,133
counties are assigned to a category for 1980-1990 percent change in poverty rate for school-age
children.
bThe formula, where there are n counties (i) in category (1), Ymode1 is the estimated number of
poor school-age children from the county model, and YCps is the estimated number of poor school-
age children from a 3-year weighted average of the CPS, is
Hi (Ymode! id-YCPS id) / ~iYCPS id
CNumber of households (unweighted) in the sample for the March 1996 CPS is shown to give an
idea of the relative sample sizes for each category. The 3-year weighted averages are based on 3
years' worth of sample, although some sample cases are the same for 2 years because of the rota-
tional design.
dCensus region and division states:
Northeast
New England: Maine, New Hampshire, Vermont, Massachusetts, Rhode Island,
Connecticut
Middle Atlantic: New York, New Jersey, Pennsylvania
Midwest
East North Central: Ohio, Indiana, Illinois, Michigan, Wisconsin
West North Central: Missouri, Minnesota, Iowa, North Dakota, South Dakota,
Nebraska, Kansas
South
West
South Atlantic: Delaware, Maryland, District of Columbia, Virginia, West Virginia,
North Carolina, South Carolina, Georgia, Florida
East South Central: Kentucky, Tennessee, Alabama, Mississippi
West South Central: Arkansas, Louisiana, Oklahoma, Texas
Mountain: Montana, Idaho, Wyoming, Colorado, New Mexico, Arizona, Utah,
Nevada
Pacific: Washington, Oregon, California, Alaska, Hawaii
eThe Economic Research Service, U.S. Department of Agriculture, classifies rural counties by
1960-1990 poverty status and economic type. Counties not classified are urban counties and rural
counties for which a classification could not be made.
SOURCE: Data from Bureau of the Census.
OCR for page 32
32
SMALL-AREA ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
Despite the sample size limitations, Table 2-2 can inform an assessment of
the performance of the county model if the results are used with caution. Of
particular interest are instances in which the model-CPS aggregate differences
are both large and in the same direction (plus or minus) for all 3 years for which
the county model is estimated. Such findings suggest a possible systematic bias
in the model that should be investigated to determine the nature of the bias and
what steps could be taken to eliminate or reduce it (e.g., by adding a predictor
variable to the model). Several persistent patterns are evident in the model-CPS
aggregate differences:
· The model shows a tendency to underpredict the number of poor school-
age children in the largest counties, those with 250,000 or more population. This
finding is consistent with the results from analyzing the distribution of the stan-
dardized residuals from the regression output. The extent of the underprediction
is not large, but it appears to be significant given the large number of CPS
households in the largest counties.
· The model shows a tendency to underpredict the number of poor school-
age children in counties with large percentages of Hispanic residents (10% or
more). There is a similar, although less pronounced, tendency for the model to
underpredict the number of poor school-age children in counties with large per-
centages of blacks. It is likely that counties with large percentages of Hispanics
or blacks are not homogeneous (e.g., large-percentage black counties include
both inner-city and rural areas). Hence, further research is needed to determine
whether the underprediction is more or less pronounced for particular subgroups
of these counties and, consequently, what steps are appropriate to ameliorate the
bias in the model.
· The model estimates are consistently very different from the weighted
CPS estimates for some categories of rural counties classified by economic type.
In particular, the model estimates for rural counties characterized as government
are much higher than the corresponding weighted CPS estimates. Although the
comparisons by economic type are based on small CPS sample sizes, it seems
worthwhile to examine some of these counties to see if a reason for these large
differences can be found.
· Finally, the model shows a tendency to underpredict the number of poor
school-age children in counties that experienced the largest declines in the pov-
erty rate for school-age children from 1980 to 1990. As was noted above, this
finding is consistent with the knowledge that any regression model can only
partially predict which cases will have the most extreme values of the outcome
variable.
Summary
Considering both the external evaluations of alternative models that were
conducted for the revised 1993 county model and the external evaluations of 3
OCR for page 33
COUNTY ESTIMATES
33
years of estimates that were conducted for the 1995 county model, the panel
concludes that the county model is working reasonably well. However, further
investigation is needed of categories of counties for which the model appears to
overpredict or underpredict the number of poor school-age children, particularly
when that phenomenon is evident for several periods.
State Model Evaluation
The state model plays an important role in the production of county estimates
of poor school-age children. Evaluations conducted of the state model for the
assessment of the revised 1993 county estimates included an internal evaluation
of the regression output for 1989 and 1993 and an external evaluation that com-
pared 1989 estimates from the model with 1990 census estimates of proportions
of poor school-age children. The results in each case supported the use of the
model. However, the state model evaluations were more limited than the county
model evaluations, as alternative state model formulations were not evaluated
explicitly.
For the assessment of the 1995 county estimates, further evaluations were
conducted of the state model. In particular, the model was estimated for 7
years 1989, 1990, 1991, 1992, 1993, 1995, and 1996 and the regression out-
put for those years was examined to determine if there were any systematic biases
in the model estimates. (The model was not estimated for 1994 because the
redesign of the CPS sample, consequent to the 1990 census, was partly but not
completely phased in for the March 1995 CPS.) Also, there was an evaluation of
the state raking factors for 1993 and 1995.
State Model Regression Output
The state regression model is a poverty rate model with the variables not
transformed (see equation (2~. The analysis of the regression output for the state
model for 1989-1993 and 1995-1996 examined the same assumptions that were
examined for the 1995 county model estimated for 1989, 1993, and 1995. The
analysis is somewhat less informative for the state model than for the county
model because there are about 1,000 counties with poor school-age children in
the CPS, but only 51 states (including the District of Columbia), and states are
collectively much more homogeneous than counties with respect to poverty rates
and other characteristics. In addition, with respect to both internal and external
evaluation, some categories of states do not contain enough states for analysis,
thereby reducing the utility of evaluation.
Nonetheless, examination of the regression output for the state model helps
assess the validity of its assumptions. With a few exceptions, the analysis sup-
ports the assumptions underlying the state model (see below); there is little evi
OCR for page 34
34
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
dence of significant problems with the model formulation (although there may be
other models that fit just as well).
Linearity Plots of standardized residuals against the four predictor vari-
ables in the state model-the proportion of child exemptions reported by families
in poverty on tax returns, the proportion of people receiving food stamps, the
proportion of people under age 65 who did not file a tax return, and a residual
from the analogous regression equation using the previous census estimate as the
dependent variable-support the assumption of linearity. Furthermore, the stan-
dardized residuals, when plotted against the model's predicted values, provide no
evidence of the need for any transformation of the variables. This result helps
justify the decision not to use the log transformation of the proportion poor as the
dependent variable.
Constancy over Time Table 2-3 shows the regression coefficients for the
predictor variables for the state model for each of the years from 1989 tol996,
excluding 1994. The coefficients for all four poverty-rate predictor variables are
positive in all 7 years and generally similar across all years. All of the coeffi-
cients are significant at the 5 percent level except that the coefficient of the
proportion of people under age 65 who did not file a tax return (column 3) is not
significant in 1989.
Inclusion or Exclusion of Predictor Variables The standardized residuals
for the state regression model were grouped into four categories for each of the
following characteristics: census region, population size in 1990, 1980 to 1990
population growth, percentage of black population in 1990, percentage of His-
panic population in 1990, percentage of group quarters residents in 1990, and
percentage of poor school-age children in 1979 (from the 1980 census). The
distributions of the standardized residuals for each category were then displayed
using box plots. For none of these box plots is there an obvious pattern to the
standardized residuals across categories, with one exception: in 1989, 1990,
1991, and 1993 the model underpredicts the proportion poor of school-age chil-
dren in the West Region (i.e., the model estimates are lower than the CPS direct
estimates for this group of states). The Census Bureau experimented with adding
a West Region indicator predictor variable to the model. The coefficient of this
variable has a negative sign for all 7 years; however, it is significant for only
1991, 1992, and 1993. For those 3 years, the model with the West Region vari-
able performs better for states in the West Region. A further examination of the
residuals from the state model without the West Region predictor variable for
individual Western states reveals that the model fairly consistently under-
predicts the proportion poor of school-age children in some Western states but
just as consistently overpredicts the proportion poor of school-age children in
other Western states. Further investigation is needed to explain these patterns.
OCR for page 35
COUNTY ESTIMATES
TABLE 2-3 Estimates of Regression Coefficients for the
1995 State Model, Estimated for 1989-1993, and 1995-1996
Predictor Variablesa
Year (1) (2) (3) (4)
1989
1990
1991
1992
1993
1995
1996
0.52
(.o9)
0.46
(.o9)
0.46
(.10)
0.41
(.10)
0.28
(.12)
0.57
(.12)
0.37
(.12)
0.71
(.20)
0.65
(.20)
0.52
(.21)
0.71
(.21)
1.14
(.25)
0.79
(.25)
0.97
(.26)
0.23
(.13)
0.42
(.15)
0.59
(.14)
0.42
(.13)
0.51
(.14)
0.32
(.13)
0.59
(.14)
0.71
(.34)
1.07
(.36)
0.84
(.37)
1.38
(.37)
1.24
(.39)
1.54
(.36)
1.02
(.36)
NOTES: All predictor variables are in terms of rates. Standard errors of the
estimated regression coefficients are in parentheses.
aPredictor variables: (1) ratio of child exemptions reported by families in
poverty on tax returns to total child exemptions; (2) ratio of people receiving
food stamps to total population; (3) ratio of people under age 65 who did not
file an income tax return to total population under age 65; (4) residual from a
regression of poverty rates for school-age children from the prior decennial
census (1980 or 1990) on the other three predictor variables.
35
Normality, Homogeneous Variances, and Outliers The distribution of the
standardized residuals from the state regression model shows some small degree
of skewness, especially in the 1992 equation. However, the skewness does not
appear sufficiently marked to be a problem. Also, the residual plots and the box
plots of the distributions of the standardized residuals against the categories of
states show little evidence of any heterogenous variance. Finally, there is no
evidence of outliers from examination of the residual plots or displays of the
distributions of the standardized residuals from the state regression model.
Model Error Variance One problem in the state model concerns the vari-
ance of the model error (ui in equation (2~. In the state model, the variances of
OCR for page 36
36
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
the sampling errors (ei in equation (2)) are estimated directly from the CPS data
using a generalized variance function. The total model error variance is calcu-
lated using maximum likelihood estimation. The result of this calculation is an
estimate of zero for the model error variance in the equation for every year except
1993. This result, which implies (absent sampling variability) that the model
gives perfect predictions of state poverty rates for school-age children, is not
credible. It produces zero weight for the direct estimates even when those esti-
mates are quite precise, as is the case for several large states in the CPS sample.
Even a small model error variance can substantially change the weight on the
relatively high-precision direct estimates when they are combined in a shrinkage
procedure with the model estimates.
To evaluate the effects of using zero model error variance in the estimation,
the panel examined tables that compared the model estimates of the proportion
poor of school-age children to the CPS direct estimates by state for 1989-1993
and 1995-1996; as an illustration, Table 2-4 shows this comparison for 1995.
This examination demonstrated two important points. First, there are some ap-
preciable differences between the model estimates and the direct estimates. For
example, for Mississippi in 1995, the difference is over 7 percentage points.
Therefore, if a non-zero estimate for model error variance is produced, it might
have important consequences for the state estimates of poor school-age children.
Second, while there are some appreciable differences, the model estimates were
within two standard errors of the direct estimates for almost all states in each
year. The range of model estimates that exceeded that limit in either a positive or
negative direction was from one state in 1992 to six states in 1996. (Mississippi's
difference in 1995 was not statistically significant at the 5 percent level.) For no
single state did the model estimates exceed two standard errors of the direct
estimates for more than 3 of the 7 years for which the state model was estimated.
(And this analysis ignores the variance of the model estimates, which means that
a yet smaller number of differences are statistically significant.) These results
suggest that the state model is performing reasonably well: differences between
model and direct estimates are neither unusually large nor strongly persistent.
However, more work should be conducted to evaluate the current procedures for
estimating the sampling error variance of the state model and the effects on the
model estimates (see Chapter 5).
State Raking Factors
The final stage in producing updated estimates of the number of poor school-
age children for counties is to rake the estimates from the county model for
consistency with the estimates from the state model. The model-l990 census
comparisons found that the raking procedure was beneficial to the county esti-
mates. The raking factors vary considerably across states. For 1995, the raking
OCR for page 37
COUNTY ESTIMATES
TABLE 2-4 CPS Direct Estimate and Regression Model Estimate of
Percentage of School-Age Children in Poverty by State, 1995
37
Lower Upper
Confidence Confidence State
Bound on
Direct
Estimate
(2)
State
CPS
Direct
Estimate
(1)
Bound on
Direct
Estimate
(3)
Model
~.
Regression
Estimate
Minus
Direct
Regression Estimate
Estimate (4) - (1)
(4) (5)
Alabama 22.2 16.5 27.9 23.4 1.2
Alaska 6.3 1.6 11.1 10.9 4.5
Arizona 23.0 16.8 29.2 21.1 -1.9
Arkansas 21.4 14.0 28.7 24.0 2.6
California 22.5 19.4 25.7 21.5 -1.0
Colorado 9.4 5.1 13.8 11.8 2.3
Connecticut 15.6 7.3 24.0 12.6 -3.0
Delaware 15.6 8.3 23.0 12.8 -2.8
District of Columbia 30.2 17.9 42.4 33.8 3.7
Florida 21.1 16.8 25.4 20.7 -0.4
Georgia 14.8 8.2 21.3 21.4 6.7
Hawaii 14.1 7.9 20.3 11.9 -2.2
Idaho 15.4 9.9 20.9 12.7 -2.7
Illinois 19.4 14.6 24.2 15.7 -3.7
Indiana 12.9 9.0 16.8 12.6 -0.4
Iowa 15.2 8.9 21.4 11.2 -3.9
Kansas 10.6 4.8 16.4 12.7 2.1
Kentucky 18.9 13.4 24.4 22.9 4.0
Louisiana 24.2 15.6 32.9 28.0 3.8
Maine 10.7 4.1 17.4 13.8 3.1
Maryland 12.8 5.0 20.5 11.5 -1.3
Massachusetts 16.5 11.5 21.5 13.3 -3.2
Michigan 14.2 10.0 18.3 17.2 3.0
Minnesota 9.5 5.5 13.4 10.0 0.6
Mississippi 34.9 25.6 44.3 27.4 -7.6
Missouri 9.4 3.5 15.2 17.0 7.7
Montana 17.4 9.4 25.3 18.4 1.0
Nebraska 11.4 7.1 15.7 10.0 -1.4
Nevada 9.8 4.0 15.6 11.8 2.0
New Hampshire 4.2 0.6 7.8 6.5 2.3
New Jersey 9.3 6.5 12.0 12.3 3.0
New Mexico 34.0 27.8 40.3 28.6 -5.5
New York 22.7 19.1 26.3 23.1 0.4
North Carolina 19.7 13.8 25.5 17.1 -2.6
North Dakota 10.3 5.3 15.2 14.1 3.8
Ohio 16.6 11.1 22.2 15.1 -1.5
Oklahoma 22.6 13.1 32.1 22.5 -0.1
Oregon 12.5 7.1 17.9 12.4 -0.1
Pennsylvania 16.1 12.5 19.7 15.3 -0.9
continued
OCR for page 38
38
TABLE 2-4 Continued
SMAL L-ARE4 ESTIMATES OF SCHOOL-AGE CHILDREN IN POVERTY
Regression
Lower Upper Estimate
Confidence Confidence State Minus
CPS Bound on Bound on Model Direct
Direct Direct Direct Regression Estimate
Estimate Estimate Estimate Estimate (4) - (1)
State (1) (2) (3) (4) (5)
Rhode Island 16.4 10.7 22.2 15.1 -1.3
South Carolina 30.8 21.9 39.7 21.9 -8.9
South Dakota 16.7 8.7 24.8 17.3 0.6
Tennessee 18.4 9.1 27.7 18.7 0.3
Texas 22.4 19.3 25.5 24.3 1.9
Utah 7.3 3.9 10.8 7.5 0.2
Vermont 11.3 3.2 19.4 11.6 0.3
Virginia 14.3 7.6 21.1 14.5 0.1
Washington 15.8 7.9 23.7 12.4 -3.4
West Virginia 23.0 13.2 32.9 25.7 2.7
Wisconsin 11.1 4.0 18.1 12.2 1.2
Wyoming 10.5 6.3 14.7 12.2 1.7
NOTE: Confidence bounds are plus or minus two standard errors on the direct estimate (95%
confidence interval, obtained using direct estimates of the CPS standard errors).
SOURCE: Data from Bureau of the Census.
factors range from 0.71 to 1.14 (two-thirds fall between 0.88 and 1.06~; for 1993,
the raking factors range from 0.91 to 1.31 (two-thirds fall between 0.98 and 1.16~.
The Census Bureau determined that the correlation between the raking fac-
tors for states in 1993 and 1995 is low, which implies that there is little systematic
variation by state across these years. Also, some variation in the raking factors is
expected given the form of the county model and the need to transform the
predicted log values of poor school-age children to estimated numbers before the
raking is performed. Nonetheless, the degree of variation in the raking factors
suggests (though there are better ways to diagnose this) that there may be state
effects not captured in the county model, which, in turn, could affect the behavior
of the model in estimating the number of poor school-age children for counties
within states. Preliminary work conducted by the panel suggests that such state
effects may be present (see Chapter 5~.
The panel urges the Census Bureau to estimate the variance of the state
raking factors to determine if the variability that they exhibit for 1993 and 1995 is
consistent with random error. If it is not, the panel urges the Census Bureau to
further investigate the state raking factors, including consideration of whether
there is any feature of the state model that might explain the variation. More
generally, the Census Bureau should conduct research on how to account for state
effects in the county model.
Representative terms from entire chapter:
county estimates