Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
with the same version, allowing stochastic variation. The results indicate that stochastic variation is probably not a problem, but detailed analysis of this variation has not been conducted and is therefore not reported here. COMPUTATIONAL PERFORMANCE The table below shows computational performance for base- year synthesis with the three versions, each synthe- sizing 3.6 million persons in 1.35 million households. Household categories 52 128 316 Balancer IPF iterations 7 11 14 Total running time (minutes) 9.9 11.9 17.4 Computational performance for forecast- year synthe- sis is similar. The performance tests were run on a Pen- tium 4 computer with a 3-GHz processor and 2 GB of memory. Regardless of version, 3 min of overhead are required to set up for synthesis: it takes more than a minute to produce the validation statistics (if desired), and more than 21â2 min are required to save the synthetic population. However, Version 316 requires much more time for other parts of the process, especially the IPF pro- cedure so that overall run time of Version 316 is nearly twice that of Version 52. The results in the table above come from runs in which the IPF stopped when all cells changed less than 5%. Reducing the stop criterion to 0.5% doubled the required iterations but increased the total run time by less than 5%. VALIDATION RESULTS This section examines the precision and accuracy of household and person variables included in the synthetic population for both the base year and the back- cast. As used here, the word âaccuracyâ refers to statistical bias; a variable with a nonzero mean percentage difference between the synthetic population and the census valida- tion value is considered inaccurate. The âpercentage dif- ferenceâ is that between synthetic value and census value for a single geographic unit (tract, PUMA, county, or supercounty). The âmean percentage differenceâ is the average of this difference across all geographic units in the region. âPrecisionâ refers to statistical variance; a variable with a large variance in the difference between the synthetic population and the census validation value is considered imprecise. The order in which variables are discussed below corresponds roughly to the decreasing level of detail in which forecast controls are applied. Fig- ure 1 provides a graphical presentation of selected vari- ables relevant to the text discussion. Income Because household income is controlled at the TAZ level in four categories for the base year and the forecast year, for all three versions, it should be the most precise and accurate of all the variables, and indeed it is. Precision is slightly higher in the base year. Version 128 is oversyn- thesizing low- income households; this probably indi- cates a minor bug in the setup inputs that should be found and corrected if that version is chosen for use. The precision and accuracy of uncontrolled income subcate- gories are noticeably worse but could be judged as good at the PUMA level of aggregation. The back- cast results in the uncontrolled subcategories cannot be correctly evaluated because of inconsistencies in the subcategory definitions between the 1990 and 2000 census years. The census PUMS data also include a personal vari- able that compares personal income with the official poverty level. The percentage of persons below the poverty level is synthesized imprecisely at the tract level but is otherwise reasonably accurate and precise in the base year. The results cannot be validated in the forecast year because of changing poverty level definitions and dollar values between census years. Household Size Household size is controlled at the TAZ level. In the base year, it is controlled in five categories for Versions 316 and 128 and in four categories for Version 52. House- hold size is controlled at the TAZ level in the forecast year, but only average household size is available. Fur- thermore, the base- year distribution is used to translate this into the controlled categories. In the base year, the controlled sizes are extremely precise and accurate; the uncontrolled household Size 4 in Version 52 is notice- ably less precise but quite accurate. The uncontrolled size categories with very few households, such as Size 6, achieve much less accuracy and precision, although accu- racy is better in the versions that control five categories. The back- cast validation procedure yields important results. First, noticeable inaccuracy arises from the use of average household size to generate the forecast control. Second, for Version 52, the precision and accuracy of the uncontrolled household Size 4 category are not notice- ably worse than the four controlled sizes. Third, the pre- cision and accuracy of Version 52 are not worse than for Versions 128 and 316. So, given that the forecasts are available only as averages, controlling five size categories instead of four yields little or no improvement in the 56 INNOVATIONS IN TRAVEL DEMAND MODELING, VOLUME 2