Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
cise, regardless of version, in both the base year and the back- cast. Race and Hispanic Categories Although there are no controls related to race and His- panic categories, in the base year, these are synthesized accurately and with reasonable precision at the PUMA level for Hispanic, white, black, and Asian categories but inaccurately and imprecisely at the tract level. For the other smaller racial categories, the results are imprecise and inaccurate at all levels. The race data definitions changed from the 1990 census to the 2000 census, mak- ing it difficult to interpret the validation results, although it appears that the accuracy and precision of the back- cast population are much worse than in the base year. School Enrollment School enrollment in two categoriesâ nursery to grade 12 and postsecondaryâ although inaccurate and impre- cise at the tract level, is reasonably accurate and precise at the PUMA level in the base year. In the back- cast, school enrollment is quite inaccurate and imprecise at all levels of geographic aggregation but perhaps usable at the PUMA level. ADDITIONAL VALIDATION RESULTS IPF Stopping Criterion The preceding validation results come from test runs in which the IPF convergence criterion was set at 5%. A change in the criterion to 1% in back- cast runs causes only a slight improvement in the mean percentage differ- ence (e.g, mean percentage difference improves from 4% to 3.9%), on average, across the usable variables at both the PUMA and tract levels of analysis. Forecast Seed Matrix When synthesizing a forecast- year synthetic population, PopSyn uses, as its starting matrix for IPF, the balanced matrix from the base- year synthetic population. Its start- ing distribution for each TAZ is a combination of the TAZ-, tract-, and PUMA- level distributions. The exact combination depends on the sizes of the TAZ and tract relative to user- assigned parameters. If the TAZ (or tract) is smaller than the user- specified minimum, then it is not trusted to provide the starting distribution; if it is larger than the user- specified maximum, then it is trusted com- pletely. In between, its distribution is blended with those of the larger geographies. The issue at hand is whether small neighborhood peculiarities persist over time. If they do, then it would be better to use the base- year TAZ- level distribution, even for small TAZ, preserving the details supplied by the base- year census tables; if they donât, then it would be better to use the distribution from the tract or PUMA. To test this, Version 316 back- casts were run with a variety of minimum and maximum size criteria. The quality of the validation results was then compared by averaging the absolute mean percentage difference and the standard deviation percentage difference across all usable variables and comparing them across runs. The results indicated that the back- cast population matched the back- cast validation values best when the minimum size was between 10 and 100 and the maximum size was between 100 and 500. The results were worst when the size parameters were set so high that the PUMA distri - butions were used exclusively. However, except for this extreme case, differences were minor compared with the levels of inaccuracy and imprecision in the best forecasts. SUMMARY OF VALIDATION RESULTS The following summary conclusions might be drawn from the above analysis about the preferred versions to use for base- year and forecast analysis: In the base year, the use of census data to control for more variables in Version 316 yields a clearly superior synthetic population, especially for tract level evaluation in controlled categories. So, for base- year analysis and short- term forecasts using the base- year population, Ver- sion 316, or perhaps even a more complex version, should be used. For the forecast year, the additional controls of Ver- sions 128 and 316 provide little value and can poten- tially make the population worse. The reason for this situation lies primarily in the reliance on averages that are translated into category distributions naively from base- year distributions rather than attempting to make informed forecasts of the distributions themselves. It probably also lies in relying on regional forecasts rather than forecasts carrying information at some smaller level of geographic aggregation. Table 2 provides a summary of the aggregate level at which various categories of variables would be reason- ably precise and accurate in the synthetic population, assuming Version 316 for base- year analysis and Version 52 for forecast- year analysis. These preliminary results demonstrate several impor- tant aspects of PopSyns. First, the accuracy of synthe- sized characteristics depends heavily on the control variables used for population synthesis; uncontrolled 61VALIDATION OF ATLANTA, GEORGIA, REGIONAL COMMISSION POPULATION SYNTHESIZER