Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 67
56 I N N O VAT I O N S I N T R AV E L D E M A N D M O D E L I N G , V O L U M E 2
with the same version, allowing stochastic variation. The ure 1 provides a graphical presentation of selected vari-
results indicate that stochastic variation is probably not ables relevant to the text discussion.
a problem, but detailed analysis of this variation has not
been conducted and is therefore not reported here.
Income
COMPUTATIONAL PERFORMANCE Because household income is controlled at the TAZ level
in four categories for the base year and the forecast year,
The table below shows computational performance for for all three versions, it should be the most precise and
base-year synthesis with the three versions, each synthe- accurate of all the variables, and indeed it is. Precision is
sizing 3.6 million persons in 1.35 million households. slightly higher in the base year. Version 128 is oversyn-
thesizing low-income households; this probably indi-
cates a minor bug in the setup inputs that should be
Household categories 52 128 316 found and corrected if that version is chosen for use. The
Balancer IPF iterations 7 11 14 precision and accuracy of uncontrolled income subcate-
Total running time gories are noticeably worse but could be judged as good
(minutes) 9.9 11.9 17.4 at the PUMA level of aggregation. The back-cast results
in the uncontrolled subcategories cannot be correctly
Computational performance for forecast-year synthe- evaluated because of inconsistencies in the subcategory
sis is similar. The performance tests were run on a Pen- definitions between the 1990 and 2000 census years.
tium 4 computer with a 3-GHz processor and 2 GB of The census PUMS data also include a personal vari-
memory. Regardless of version, 3 min of overhead are able that compares personal income with the official
required to set up for synthesis: it takes more than a poverty level. The percentage of persons below the
minute to produce the validation statistics (if desired), poverty level is synthesized imprecisely at the tract level
and more than 21/2 min are required to save the synthetic but is otherwise reasonably accurate and precise in the
population. However, Version 316 requires much more base year. The results cannot be validated in the forecast
time for other parts of the process, especially the IPF pro- year because of changing poverty level definitions and
cedure so that overall run time of Version 316 is nearly dollar values between census years.
twice that of Version 52.
The results in the table above come from runs in
which the IPF stopped when all cells changed less than Household Size
5%. Reducing the stop criterion to 0.5% doubled the
required iterations but increased the total run time by Household size is controlled at the TAZ level. In the base
less than 5%. year, it is controlled in five categories for Versions 316
and 128 and in four categories for Version 52. House-
hold size is controlled at the TAZ level in the forecast
VALIDATION RESULTS year, but only average household size is available. Fur-
thermore, the base-year distribution is used to translate
This section examines the precision and accuracy of this into the controlled categories. In the base year, the
household and person variables included in the synthetic controlled sizes are extremely precise and accurate; the
population for both the base year and the back-cast. As uncontrolled household Size 4 in Version 52 is notice-
used here, the word "accuracy" refers to statistical bias; ably less precise but quite accurate. The uncontrolled
a variable with a nonzero mean percentage difference size categories with very few households, such as Size 6,
between the synthetic population and the census valida- achieve much less accuracy and precision, although accu-
tion value is considered inaccurate. The "percentage dif- racy is better in the versions that control five categories.
ference" is that between synthetic value and census value The back-cast validation procedure yields important
for a single geographic unit (tract, PUMA, county, or results. First, noticeable inaccuracy arises from the use of
supercounty). The "mean percentage difference" is the average household size to generate the forecast control.
average of this difference across all geographic units in Second, for Version 52, the precision and accuracy of the
the region. "Precision" refers to statistical variance; a uncontrolled household Size 4 category are not notice-
variable with a large variance in the difference between ably worse than the four controlled sizes. Third, the pre-
the synthetic population and the census validation value cision and accuracy of Version 52 are not worse than for
is considered imprecise. The order in which variables are Versions 128 and 316. So, given that the forecasts are
discussed below corresponds roughly to the decreasing available only as averages, controlling five size categories
level of detail in which forecast controls are applied. Fig- instead of four yields little or no improvement in the