Cover Image

Not for Sale

View/Hide Left Panel
Click for next page ( 73

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 72
VALIDATION OF ATLANTA, GEORGIA, REGIONAL COMMISSION POPULATION SYNTHESIZER 61 cise, regardless of version, in both the base year and the pletely. In between, its distribution is blended with those back-cast. of the larger geographies. The issue at hand is whether small neighborhood peculiarities persist over time. If they do, then it would be better to use the base-year Race and Hispanic Categories TAZ-level distribution, even for small TAZ, preserving the details supplied by the base-year census tables; if they Although there are no controls related to race and His- don't, then it would be better to use the distribution from panic categories, in the base year, these are synthesized the tract or PUMA. accurately and with reasonable precision at the PUMA To test this, Version 316 back-casts were run with a level for Hispanic, white, black, and Asian categories but variety of minimum and maximum size criteria. The inaccurately and imprecisely at the tract level. For the quality of the validation results was then compared by other smaller racial categories, the results are imprecise averaging the absolute mean percentage difference and and inaccurate at all levels. The race data definitions the standard deviation percentage difference across all changed from the 1990 census to the 2000 census, mak- usable variables and comparing them across runs. The ing it difficult to interpret the validation results, although results indicated that the back-cast population matched it appears that the accuracy and precision of the back- the back-cast validation values best when the minimum cast population are much worse than in the base year. size was between 10 and 100 and the maximum size was between 100 and 500. The results were worst when the size parameters were set so high that the PUMA distri- School Enrollment butions were used exclusively. However, except for this extreme case, differences were minor compared with the School enrollment in two categories--nursery to grade levels of inaccuracy and imprecision in the best forecasts. 12 and postsecondary--although inaccurate and impre- cise at the tract level, is reasonably accurate and precise at the PUMA level in the base year. In the back-cast, SUMMARY OF VALIDATION RESULTS school enrollment is quite inaccurate and imprecise at all levels of geographic aggregation but perhaps usable at The following summary conclusions might be drawn the PUMA level. from the above analysis about the preferred versions to use for base-year and forecast analysis: In the base year, the use of census data to control for ADDITIONAL VALIDATION RESULTS more variables in Version 316 yields a clearly superior synthetic population, especially for tract level evaluation IPF Stopping Criterion in controlled categories. So, for base-year analysis and short-term forecasts using the base-year population, Ver- The preceding validation results come from test runs in sion 316, or perhaps even a more complex version, which the IPF convergence criterion was set at 5%. A should be used. change in the criterion to 1% in back-cast runs causes For the forecast year, the additional controls of Ver- only a slight improvement in the mean percentage differ- sions 128 and 316 provide little value and can poten- ence (e.g, mean percentage difference improves from 4% tially make the population worse. The reason for this to 3.9%), on average, across the usable variables at both situation lies primarily in the reliance on averages that the PUMA and tract levels of analysis. are translated into category distributions naively from base-year distributions rather than attempting to make informed forecasts of the distributions themselves. It Forecast Seed Matrix probably also lies in relying on regional forecasts rather than forecasts carrying information at some smaller level When synthesizing a forecast-year synthetic population, of geographic aggregation. PopSyn uses, as its starting matrix for IPF, the balanced Table 2 provides a summary of the aggregate level at matrix from the base-year synthetic population. Its start- which various categories of variables would be reason- ing distribution for each TAZ is a combination of the ably precise and accurate in the synthetic population, TAZ-, tract-, and PUMA-level distributions. The exact assuming Version 316 for base-year analysis and Version combination depends on the sizes of the TAZ and tract 52 for forecast-year analysis. relative to user-assigned parameters. If the TAZ (or tract) These preliminary results demonstrate several impor- is smaller than the user-specified minimum, then it is not tant aspects of PopSyns. First, the accuracy of synthe- trusted to provide the starting distribution; if it is larger sized characteristics depends heavily on the control than the user-specified maximum, then it is trusted com- variables used for population synthesis; uncontrolled