Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 72
VALIDATION OF ATLANTA, GEORGIA, REGIONAL COMMISSION POPULATION SYNTHESIZER 61
cise, regardless of version, in both the base year and the pletely. In between, its distribution is blended with those
back-cast. of the larger geographies. The issue at hand is whether
small neighborhood peculiarities persist over time. If
they do, then it would be better to use the base-year
Race and Hispanic Categories TAZ-level distribution, even for small TAZ, preserving
the details supplied by the base-year census tables; if they
Although there are no controls related to race and His- don't, then it would be better to use the distribution from
panic categories, in the base year, these are synthesized the tract or PUMA.
accurately and with reasonable precision at the PUMA To test this, Version 316 back-casts were run with a
level for Hispanic, white, black, and Asian categories but variety of minimum and maximum size criteria. The
inaccurately and imprecisely at the tract level. For the quality of the validation results was then compared by
other smaller racial categories, the results are imprecise averaging the absolute mean percentage difference and
and inaccurate at all levels. The race data definitions the standard deviation percentage difference across all
changed from the 1990 census to the 2000 census, mak- usable variables and comparing them across runs. The
ing it difficult to interpret the validation results, although results indicated that the back-cast population matched
it appears that the accuracy and precision of the back- the back-cast validation values best when the minimum
cast population are much worse than in the base year. size was between 10 and 100 and the maximum size was
between 100 and 500. The results were worst when the
size parameters were set so high that the PUMA distri-
School Enrollment butions were used exclusively. However, except for this
extreme case, differences were minor compared with the
School enrollment in two categories--nursery to grade levels of inaccuracy and imprecision in the best forecasts.
12 and postsecondary--although inaccurate and impre-
cise at the tract level, is reasonably accurate and precise
at the PUMA level in the base year. In the back-cast, SUMMARY OF VALIDATION RESULTS
school enrollment is quite inaccurate and imprecise at all
levels of geographic aggregation but perhaps usable at The following summary conclusions might be drawn
the PUMA level. from the above analysis about the preferred versions to
use for base-year and forecast analysis:
In the base year, the use of census data to control for
ADDITIONAL VALIDATION RESULTS more variables in Version 316 yields a clearly superior
synthetic population, especially for tract level evaluation
IPF Stopping Criterion in controlled categories. So, for base-year analysis and
short-term forecasts using the base-year population, Ver-
The preceding validation results come from test runs in sion 316, or perhaps even a more complex version,
which the IPF convergence criterion was set at 5%. A should be used.
change in the criterion to 1% in back-cast runs causes For the forecast year, the additional controls of Ver-
only a slight improvement in the mean percentage differ- sions 128 and 316 provide little value and can poten-
ence (e.g, mean percentage difference improves from 4% tially make the population worse. The reason for this
to 3.9%), on average, across the usable variables at both situation lies primarily in the reliance on averages that
the PUMA and tract levels of analysis. are translated into category distributions naively from
base-year distributions rather than attempting to make
informed forecasts of the distributions themselves. It
Forecast Seed Matrix probably also lies in relying on regional forecasts rather
than forecasts carrying information at some smaller level
When synthesizing a forecast-year synthetic population, of geographic aggregation.
PopSyn uses, as its starting matrix for IPF, the balanced Table 2 provides a summary of the aggregate level at
matrix from the base-year synthetic population. Its start- which various categories of variables would be reason-
ing distribution for each TAZ is a combination of the ably precise and accurate in the synthetic population,
TAZ-, tract-, and PUMA-level distributions. The exact assuming Version 316 for base-year analysis and Version
combination depends on the sizes of the TAZ and tract 52 for forecast-year analysis.
relative to user-assigned parameters. If the TAZ (or tract) These preliminary results demonstrate several impor-
is smaller than the user-specified minimum, then it is not tant aspects of PopSyns. First, the accuracy of synthe-
trusted to provide the starting distribution; if it is larger sized characteristics depends heavily on the control
than the user-specified maximum, then it is trusted com- variables used for population synthesis; uncontrolled