Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 67
56 I N N O VAT I O N S I N T R AV E L D E M A N D M O D E L I N G , V O L U M E 2 with the same version, allowing stochastic variation. The ure 1 provides a graphical presentation of selected vari- results indicate that stochastic variation is probably not ables relevant to the text discussion. a problem, but detailed analysis of this variation has not been conducted and is therefore not reported here. Income COMPUTATIONAL PERFORMANCE Because household income is controlled at the TAZ level in four categories for the base year and the forecast year, The table below shows computational performance for for all three versions, it should be the most precise and base-year synthesis with the three versions, each synthe- accurate of all the variables, and indeed it is. Precision is sizing 3.6 million persons in 1.35 million households. slightly higher in the base year. Version 128 is oversyn- thesizing low-income households; this probably indi- cates a minor bug in the setup inputs that should be Household categories 52 128 316 found and corrected if that version is chosen for use. The Balancer IPF iterations 7 11 14 precision and accuracy of uncontrolled income subcate- Total running time gories are noticeably worse but could be judged as good (minutes) 9.9 11.9 17.4 at the PUMA level of aggregation. The back-cast results in the uncontrolled subcategories cannot be correctly Computational performance for forecast-year synthe- evaluated because of inconsistencies in the subcategory sis is similar. The performance tests were run on a Pen- definitions between the 1990 and 2000 census years. tium 4 computer with a 3-GHz processor and 2 GB of The census PUMS data also include a personal vari- memory. Regardless of version, 3 min of overhead are able that compares personal income with the official required to set up for synthesis: it takes more than a poverty level. The percentage of persons below the minute to produce the validation statistics (if desired), poverty level is synthesized imprecisely at the tract level and more than 21/2 min are required to save the synthetic but is otherwise reasonably accurate and precise in the population. However, Version 316 requires much more base year. The results cannot be validated in the forecast time for other parts of the process, especially the IPF pro- year because of changing poverty level definitions and cedure so that overall run time of Version 316 is nearly dollar values between census years. twice that of Version 52. The results in the table above come from runs in which the IPF stopped when all cells changed less than Household Size 5%. Reducing the stop criterion to 0.5% doubled the required iterations but increased the total run time by Household size is controlled at the TAZ level. In the base less than 5%. year, it is controlled in five categories for Versions 316 and 128 and in four categories for Version 52. House- hold size is controlled at the TAZ level in the forecast VALIDATION RESULTS year, but only average household size is available. Fur- thermore, the base-year distribution is used to translate This section examines the precision and accuracy of this into the controlled categories. In the base year, the household and person variables included in the synthetic controlled sizes are extremely precise and accurate; the population for both the base year and the back-cast. As uncontrolled household Size 4 in Version 52 is notice- used here, the word "accuracy" refers to statistical bias; ably less precise but quite accurate. The uncontrolled a variable with a nonzero mean percentage difference size categories with very few households, such as Size 6, between the synthetic population and the census valida- achieve much less accuracy and precision, although accu- tion value is considered inaccurate. The "percentage dif- racy is better in the versions that control five categories. ference" is that between synthetic value and census value The back-cast validation procedure yields important for a single geographic unit (tract, PUMA, county, or results. First, noticeable inaccuracy arises from the use of supercounty). The "mean percentage difference" is the average household size to generate the forecast control. average of this difference across all geographic units in Second, for Version 52, the precision and accuracy of the the region. "Precision" refers to statistical variance; a uncontrolled household Size 4 category are not notice- variable with a large variance in the difference between ably worse than the four controlled sizes. Third, the pre- the synthetic population and the census validation value cision and accuracy of Version 52 are not worse than for is considered imprecise. The order in which variables are Versions 128 and 316. So, given that the forecasts are discussed below corresponds roughly to the decreasing available only as averages, controlling five size categories level of detail in which forecast controls are applied. Fig- instead of four yields little or no improvement in the