Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 65
Using ACS Data 65
Table 4.20 Travel time to work.
Travel Time Pooled Sample
Percent of commuters with short commutes (< 20 minutes) 1.0207
Percent of commuters with long commutes (> 20 minutes) 0.986
Table 4.21. Median income.
Median Income MSA/CMSA 5 Million
Decennial 2000 0.9396 0.955 0.9648
Decennial 1999 0.971 0.9869 0.997
4.6 Implications of ACS Data Release Frequency
4.6.1 Frequency of Data Releases
Annual estimates will be released for areas with population greater than 65,000, starting in
year 2006. Three-year moving average estimates will be released for areas with population
greater than 20,000 starting in year 2008. Five-year moving averages will be released for all areas
starting in year 2010. Table 4.22 illustrates the data release schedule.
The main advantage of ACS in this respect is the timeliness of the data. This is especially impor-
tant in mid-decade or during the years prior to the decennial census, where the census data from
the previous decennial census would have become relatively outdated. Moreover, the availability
of the ACS data on an annual basis, especially for large areas where the estimates are more reli-
able, enhances the ability to do trend analysis and use other time series analysis methods.
The availability of continuously updated data, however, might create burdens for analysts and
data keepers. Transportation analysts should determine the frequency of updating their travel
surveys (development and expansion), market analyses (e.g., environmental justice analysis),
and travel demand models. This will depend on the particular analysis performed, area size, cost
of the update, and utility obtained from updating the analysis. For example, travel demand mod-
els might not need to be updated annually; a five-year modeling cycle might be sufficient.
Moreover, users should determine which type of ACS estimate to use when there is more than
one type available for a given area. For example, for areas with population greater than 65,000,
annual estimates, three- and five-year moving averages are released. For areas with population
between 20,000 and 65,000, three- and five-year moving averages are released. The type of esti-
mate to use will depend on the purpose of the analysis, as follows:
· Consistency--If the characteristics of two populations in areas of similar geographic scales
(e.g., populations of two counties or two states) are compared, it is important to use the same
Table 4.22. ACS Data release schedule.
Data for the Previous Year Released In...
Type of Data Population/Size of Area 2006 2007 2008 2009 2010 2011 2012
1.Annual Estimates 65,000+
2.Three-Year Averages 20,000+
3.Five-Year Averages Tract/Block Group
OCR for page 66
66 A Guidebook for Using American Community Survey Data for Transportation Planning
type of estimate to ensure consistency. For example, if County A has a 65,000+ population and
County B has a population less than 65,000, then it is recommended that the multiyear or
cumulative average estimate from County A (rather than the single-year estimate, which is
available) be used to compare it to the moving average estimate from County B (where annual
estimates are unavailable).
· Reduction in Lag Time--If the timeliness of the data is important for the analysis, and if the
single-year estimates are deemed reliable (e.g., with reasonable standard errors and without
too many fluctuations), the analyst could use the single-year estimates rather than the mov-
ing average estimates to reduce the lag time between the analysis year and data collection year.
· Greater Reliability--If the analysis focuses on a certain subpopulation for which three- and
five-year moving averages are available, and if greater reliability is desired, the five-year mov-
ing averages would be more stable to use.
· Reducing Correlations--Moving averages that include overlapping years are correlated (see
the discussion below). Therefore, when testing for the significance of an annual rate of change,
it is recommended that annual estimates be used rather than moving average estimates that
include overlapping years.
4.6.2 Measuring ACS Changes Across Years
The improved frequency of data allows users to better analyze changes within prescribed geo-
graphic areas. A new data product, the multiyear profile, provided by the Census Bureau summa-
rizes the year-to-year changes in ACS estimates and identifies statistically significant differences.
The computational techniques used by the Census Bureau, as well as those that can be used
by ACS analysts, for comparing estimates across years are summarized in a Census Bureau data
accuracy memorandum entitled 2002 and beyond Change Profile Accuracy.42
This document provides two useful example calculations that are summarized below. These
examples show how to
· Determine the statistical significance of differences in percent distributions; and
· Determine the statistical significance of other differences.
Example Calculation 1 Determine if a year-to-year difference in an ACS percentage is sta-
tistically significant.
Problem The 2001 ACS for Bronx County, New York, estimates the number of women, aged
15 and over, to be 533,280, with lower and upper bounds of 533,062 and 533,498, respectively.
The estimated number of these women who have never married is 213,545, with a lower bound
of 208,349 and an upper bound of 218,741.
In 2002, ACS estimated the number of women age 15 and over to be 538,338, with lower and
upper bounds of 537,558 and 539,118, and the number of these women who have never married
to be 220,675, with lower and upper bounds of 214,146 and 227,204.
Did the percentage of women who have never been married increase significantly between the
years?
Relevant Equations
Standard error = 90 percent confidence margin of error/1.65
Margin of error = max(upper bound estimate, estimate lower bound)
42
See www.census.gov/acs/www/Downloads/ACS/accuracy2002change.pdf.
OCR for page 67
Using ACS Data 67
Note: many, but not all, ACS intervals are symmetrical around the reported estimate, so
choosing the maximum interval is the conservative approach to establishing the margin of
error.
Standard error of a proportion:
1 ^2
^) =
SE( p ^ ))2 - X (SE(Y
(SE( X ^ ))2
^
Y ^2
Y
Note: this approximation is valid for proportions of two estimates where the numerator (X)
is a subset of the denominator (Y).
Difference
(
^Final - P
DIFF = 100 × P ^Initial )
Standard error of the difference:
^Final )]2 + [SE(P
SE(DIFF ) = [SE(P ^Initial )]2
Margin of error of the difference:
ME(DIFF) = 1.65 x SE(DIFF)
Calculations
Year 2001
SE(X^ ) = SE(213,545) = (218,741 - 213,545)/1.65 = 3,149
SE(Y^ ) = SE(533,280) = (533,498 - 533,280)/1.65 = 132
^ ) = SE(0.400) = 1 213, 5452
SE( p (3,149)2 - (132)2 = 0.006
533, 280 533, 2802
Year 2002
SE(X^ ) = SE(220,675) = (227,204 - 220,675)/1.65 = 3,957
SE(Y^ ) = SE(538,338) = (539,118 - 538,338)/1.65 = 473
^ ) = SE(0.410) = 1 220, 6752
SE( p (3, 957)2 - (473)2 = 0.007
538, 338 538, 3382
Comparison
DIFF = 100 × (0.410 - 0.400) = 1.0 percent
SE(DIFF ) = [0.006]2 + [0.007]2 = 0.9 percent
ME(DIFF) = 1.65 × 0.9 = 1.5 percent
Lower bound = 1.0 percent 1.5 percent = -0.5 percent
Upper bound = 1.0 percent + 1.5 percent = 2.5 percent
Discussion Since the lower bound and upper bound have different signs, the year-to-year
difference is not significant at the 90 percent confidence level.
Example Calculation 2 Compare differences for other estimates.
OCR for page 68
68 A Guidebook for Using American Community Survey Data for Transportation Planning
Problem The mean travel time to work for Bronx County in 2001 was 41.0 minutes, with an
upper bound of 41.7 minutes. In 2002, the ACS mean travel time to work was 41.8 minutes with
an upper bound of 42.8 minutes. Did the mean travel time to work change significantly between
the years?
Relevant Equations Means and other non-percentage ACS estimates are as follows:
Difference:
(
^ Final - X
DIFF = X ^ Initial )
Standard error of the difference:
^ Final )]2 + [SE( X
SE(DIFF ) = [SE( X ^ Initial )]2
Margin of error of the difference:
ME(DIFF) = 1.65 × SE(DIFF)
Calculations
Year 2001
SE(X^ ) = SE(41.0) = (41.7 - 41.0)/1.65 = 0.4
Year 2002
SE(X^ ) = SE(41.8) = (42.8 - 41.8)/1.65 = 0.6
Comparison
DIFF = (41.8 - 41.0) = 0.8 minutes
SE(DIFF ) = [0.4]2 + [0.6]2 = 0.7 minutes
ME(DIFF) = 1.65 × 0.7 = 1.2 minutes
Lower bound = 0.8 minutes 1.2 minutes = -0.4 minutes
Upper bound = 0.8 minutes + 1.2 minutes = 2.0 minutes
Discussion Since the lower bound and upper bound have different signs, the year-to-year
difference is not significant at the 90 percent confidence level. With the standard errors of 0.4
minutes for 2001 and 0.6 minutes for 2002, the difference in the mean travel time would have to
had been more than 1.2 minutes for the difference to be statistically significant.
Alternatively, if the 2002 standard error were 0.27 minutes, the difference of 0.8 minutes
would have been statistically significant at the 90 percent confidence level:
DIFF = (41.8 - 41.0) = 0.8 minutes
SE(DIFF ) = [0.4]2 + [0.27]2 = 0.48 min
ME(DIFF) = 1.65 × 0.48 = 0.79 min
Lower bound = 0.8 minutes 0.79 minutes = 0.01 minutes
Upper bound = 0.8 minutes + 0.79 minutes = 1.59 minutes.
An analyst is not restricted to using the 90 percent confidence level even though the Census
Bureau reports the data at this level. If one wanted to compare the mean travel times for the dif-
ferent years using a confidence level of 80 percent, the calculations could be accomplished as
shown below:
OCR for page 69
Using ACS Data 69
Year 2001
SE(X^ ) = SE(41.0) = (41.7 - 41.0)/1.65 = 0.4
Year 2002
SE(X^ ) = SE(41.8) = (42.8 - 41.8)/1.65 = 0.6
The calculation Census Bureau upper and lower bounds for values use a 90 percent confidence
level. Thus, 1.65 is used as the denominator.
DIFF = (41.8 - 41.0) = 0.8 minutes
SE(DIFF ) = [0.4]2 + [0.6]2 = 0.7 minutes
ME(DIFF) = 1.28 × 0.7 = 0.9 minutes
For the comparison, the critical value associated with the 80 percent confidence level, 1.28, is
used to calculate the margin of error of the difference. Table 4.10 showed factors associated with
different confidence levels, and a statistics textbook would include others.
Lower bound = 0.8 minutes 0.9 minutes = -0.1 minutes
Upper bound = 0.8 minutes + 0.9 minutes = 1.7 minutes.
Therefore, even at the 80 percent confidence level, the lower and upper bounds of the differ-
ence are opposite signs indicating that the difference is not statistically significant.
In practice, it is not likely that an analyst would be interested in making comparisons with
confidence levels that are lower than the Census Bureau's 90 percent. It is more likely that if one
were using a different confidence level, it would be the 95 percent confidence level (for which
one would use a critical value factor of 1.96).
4.6.3 Multiyear Averaging/Analysis of Overlapping Averages
The main advantage of moving averages, as compared to annual estimates, is that moving
averages smooth the data and are thus more reliable (lower standard errors, less year-to-year
variation). Since moving averages smooth out the random fluctuations in the data, they can
provide a clearer visual picture of the overall trend in a certain variable of interest. The main
disadvantage of moving averages is the lag time associated with them. If conditions are rela-
tively stable across the years over which data are averaged, multiyear average estimates will
be close to the annual estimates. However, if conditions change dramatically in a given year,
the annual estimate reflects the change in a more timely manner than does the multiyear
average.
There are two issues to consider with regard to the use of ACS multiyear averages. The first
issue is related to the comparison of two moving averages that include overlapping years. It is
important to note that statistically valid annual estimates of change cannot be computed from
the difference of two moving averages if the two moving averages are based on data from over-
lapping years, such as from a moving average of years 1996-1998 and a moving average of
years 1997-1999. This is because when standard statistical procedures are used to test for sig-
nificant differences between estimates over time, it is assumed that the two estimates are
drawn from independent samples. This assumption is violated in the case of the overlapping
moving averages.
One tempting way to look at the comparison of two consecutive overlapping moving averages,
say 2003-2005 and 2004-2006, is that it is in essence a comparison of the difference between 2006
OCR for page 70
70 A Guidebook for Using American Community Survey Data for Transportation Planning
(which is only in the second multiyear period) and 2003 (which is only in the first multiyear period).
Unfortunately, the fact that the Census Bureau has released these data as multiyear averages is a
recognition that a direct comparison between 2003 by itself and 2006 by itself for this geography is
not valid because the individual year sample sizes will not support the comparison. When an ana-
lyst uses the multiyear overlapping estimates to make conclusions about single years, he or she is, in
effect, cheating by using an artificially high number of data records that include the overlapping
years (2004 and 2005, in this case). The analyst is claiming the reduced sampling error that comes
with more data records, but in reality, only a portion (a third, in this case) of each sample's records
actually contribute to the comparison the analyst is making.
This is not to say that one should not get a qualitative idea of the pattern of change from exam-
ining these overlapping moving averages, especially as they accumulate over time. Such time
series will be very informative to data users as they try to capture what is happening in a region
over time. However, there is a need to be cautious about making definitive conclusions about
the differences of the overlapping estimates.
As the combination of multi- and single-year averages accumulate for geographic areas of dif-
ferent sizes, it is likely that it will be common for transportation planners and other ACS data
users to develop factoring methods and iterative proportional fitting methods that combine
multiyear average estimates for smaller geographic areas and single-year estimates for corre-
sponding larger geographic areas to synthesize single-year estimates for geographic areas that do
not support this level of ACS reporting. For more homogenous areas and ACS characteristics,
these methods will provide reasonable small area estimates. However, analysts will need to
remember that the ACS sample sizes do not really support such analyses, and therefore any con-
clusions drawn from these synthesized data are speculative.
The second issue related to multiyear estimates is that moving averages also present problems
when used as dependent variables in several statistical models (such as time series models) and
regression models, since the statistical properties of the data (such as autocorrelations) would be
affected by the moving averages. Users should understand the implicit statistical assumptions in
their analyses and be sure that the ACS data comply with these assumptions. For instance, if an
analyst wanted to test the effect of gasoline prices on commuting modes for a small area (requir-
ing multiyear averaging), he or she will not be able to use monthly, or even annual, gas price data
effectively. The analyst will need to develop estimates of the independent variable in the same
timeframe during which the ACS data are available.
4.6.4 Seasonality Analysis Using ACS
ACS data are collected throughout the year, as opposed to at a single point in time like the cen-
sus Long Form data, so it will be important for data users to remember that analyses of other
data in conjunction with ACS data will need to reflect the full year.
Because seasonality is very interesting from a transportation planning perspective, as travel
patterns can vary significantly throughout the year, U.S. DOT has sponsored an analysis of sea-
sonality using Hampden County, Massachusetts, data. For this guidebook, seasonality in two
other ACS test counties, Broward County, Florida, and Pima County, Arizona, were analyzed
using the evaluation datasets provided by FHWA and the Census Bureau. These datasets also
were used for the comparisons described in Appendix I. The seasonality analyses that were per-
formed with these data relied on information about the quarter of the year in which the data
were collected. Since quarter data generally will not be available to ACS users, the results of these
analyses are included in Appendix J. The key lesson from the analysis is that for some locations,
seasonality will have an important effect on ACS results but, unfortunately, without information
OCR for page 71
Using ACS Data 71
on the time of year that the responses were obtained, we will not have much opportunity to
address the issue.
4.6.5 ACS Continuity
An important concern about replacing the census Long Form with the continuous ACS is that
by separating the sample data collection from the constitutionally mandated census count, ACS
is more likely than the Long Form to be cut back or eliminated during the government's budg-
eting process.
Effect of Missing Data. Given the large standard errors of the ACS estimates, any further
reduction in sample size will adversely impact the quality of its estimates, which will be reflected
in larger standard errors. The relationship between the sampling rate and the resulting standard
error of the estimates is shown by the following equation:
^
( ) ^ 1 - Y
^ = SY
SE Y
N
where
S is the inverse of the sampling rate minus 1,
Y^ is the estimate,
N is the total count of people or housing units, and
SE(Y ^ ) is the standard error of Y^.
For example, if the sampling rate is cut by half, the resulting standard error is equal to
S2
2 = 1.41 times the original standard error.
S1
Sample size reduction due to potential budget cuts could have an effect on different phases of
the ACS data collection program. The effect of eliminating the data collection, even for a single
year, will be severe, and because of the multiyear averaging, will be long lasting.
Effect of Making ACS Voluntary The Census Bureau evaluated the effects of making par-
ticipation in ACS voluntary, rather than mandatory. In two data quality reports that analyzed
this issue, the Census Bureau responded to questionnaire content and increasing public privacy
concerns by evaluating the potential effects of having ACS implemented as a voluntary survey.
These reports are
· Report 3: Testing the Use of Voluntary Methods and
· Report 11: Testing Voluntary Methods--Additional Results.
To analyze the potential effects of this change, the Census Bureau performed a test using the
March and April 2003 ACS sample. Four experimental mail treatments were used as follows:
· A mandatory treatment identical to the mandatory treatment that had been used previously,
· An alternative mandatory treatment that attempted to improve the user-friendliness of the
mail survey,
· A standard voluntary approach similar to that used for other voluntary Census Bureau sur-
veys, and
· A voluntary treatment that explicitly told respondents that the survey was voluntary.
Voluntary methods also were applied to the telephone and in-person surveys. The responses
to the different treatments were then compared with each other and with the year 2002 manda-
tory treatment.
OCR for page 72
72 A Guidebook for Using American Community Survey Data for Transportation Planning
Based on their analyses of the data, the analysts drew the following conclusions:
· A dramatic decrease (more than 20 percentage points) occurred in mail response when the
standard survey was voluntary.
· The reliability of estimates was adversely impacted by the reduction in the total number of
completed interviews--producing reliable results with voluntary methods would require an
increased initial sample size.
· The decrease in cooperation across all three modes of data collection resulted in a notewor-
thy, but not critical, drop in the weighted survey response rate.
· The estimated annual cost of implementing the ACS would increase by at least $59.2 million
if the survey was voluntary and reliability was maintained.
· Levels of item non-response for the data collected under voluntary and mandatory methods
were very similar. Although the differences in item non-response at the topic level were sta-
tistically significant, the item non-response rates were very similar.
· The use of voluntary methods had a negative impact on traditionally low-response areas that
will compromise our ability to produce reliable data for these areas and for small population
groups such as blacks, Hispanics, Asians, American Indians, and Alaska Natives.
· The change to voluntary methods had the greatest impact on areas that have traditionally high
levels of cooperation and on white and non-Hispanic households.
· Compared to a standard voluntary survey, the use of a more direct presentation of the volun-
tary message 1) resulted in an additional decrease of four percentage points in mail response
and 2) had only a minor additional impact on data quality, with an additional 1.6 percent
decrease in the interview rate and an additional 0.4 percent decrease in the survey response rate.
· Compared to the current mandatory treatment, the revised mandatory treatment, which was
intended to be more user-friendly, resulted in only a slight increase in mail cooperation
(increase of 1.9 percentage points).
· Although the mail check-in rates were much higher for the mandatory treatments than for the
voluntary treatments, the overall patterns of mail responses over time were remarkably simi-
lar across all four treatments.