Cover Image

Not for Sale



View/Hide Left Panel
Click for next page ( 66


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 65
Using ACS Data 65 Table 4.20 Travel time to work. Travel Time Pooled Sample Percent of commuters with short commutes (< 20 minutes) 1.0207 Percent of commuters with long commutes (> 20 minutes) 0.986 Table 4.21. Median income. Median Income MSA/CMSA 5 Million Decennial 2000 0.9396 0.955 0.9648 Decennial 1999 0.971 0.9869 0.997 4.6 Implications of ACS Data Release Frequency 4.6.1 Frequency of Data Releases Annual estimates will be released for areas with population greater than 65,000, starting in year 2006. Three-year moving average estimates will be released for areas with population greater than 20,000 starting in year 2008. Five-year moving averages will be released for all areas starting in year 2010. Table 4.22 illustrates the data release schedule. The main advantage of ACS in this respect is the timeliness of the data. This is especially impor- tant in mid-decade or during the years prior to the decennial census, where the census data from the previous decennial census would have become relatively outdated. Moreover, the availability of the ACS data on an annual basis, especially for large areas where the estimates are more reli- able, enhances the ability to do trend analysis and use other time series analysis methods. The availability of continuously updated data, however, might create burdens for analysts and data keepers. Transportation analysts should determine the frequency of updating their travel surveys (development and expansion), market analyses (e.g., environmental justice analysis), and travel demand models. This will depend on the particular analysis performed, area size, cost of the update, and utility obtained from updating the analysis. For example, travel demand mod- els might not need to be updated annually; a five-year modeling cycle might be sufficient. Moreover, users should determine which type of ACS estimate to use when there is more than one type available for a given area. For example, for areas with population greater than 65,000, annual estimates, three- and five-year moving averages are released. For areas with population between 20,000 and 65,000, three- and five-year moving averages are released. The type of esti- mate to use will depend on the purpose of the analysis, as follows: Consistency--If the characteristics of two populations in areas of similar geographic scales (e.g., populations of two counties or two states) are compared, it is important to use the same Table 4.22. ACS Data release schedule. Data for the Previous Year Released In... Type of Data Population/Size of Area 2006 2007 2008 2009 2010 2011 2012 1.Annual Estimates 65,000+ 2.Three-Year Averages 20,000+ 3.Five-Year Averages Tract/Block Group

OCR for page 65
66 A Guidebook for Using American Community Survey Data for Transportation Planning type of estimate to ensure consistency. For example, if County A has a 65,000+ population and County B has a population less than 65,000, then it is recommended that the multiyear or cumulative average estimate from County A (rather than the single-year estimate, which is available) be used to compare it to the moving average estimate from County B (where annual estimates are unavailable). Reduction in Lag Time--If the timeliness of the data is important for the analysis, and if the single-year estimates are deemed reliable (e.g., with reasonable standard errors and without too many fluctuations), the analyst could use the single-year estimates rather than the mov- ing average estimates to reduce the lag time between the analysis year and data collection year. Greater Reliability--If the analysis focuses on a certain subpopulation for which three- and five-year moving averages are available, and if greater reliability is desired, the five-year mov- ing averages would be more stable to use. Reducing Correlations--Moving averages that include overlapping years are correlated (see the discussion below). Therefore, when testing for the significance of an annual rate of change, it is recommended that annual estimates be used rather than moving average estimates that include overlapping years. 4.6.2 Measuring ACS Changes Across Years The improved frequency of data allows users to better analyze changes within prescribed geo- graphic areas. A new data product, the multiyear profile, provided by the Census Bureau summa- rizes the year-to-year changes in ACS estimates and identifies statistically significant differences. The computational techniques used by the Census Bureau, as well as those that can be used by ACS analysts, for comparing estimates across years are summarized in a Census Bureau data accuracy memorandum entitled 2002 and beyond Change Profile Accuracy.42 This document provides two useful example calculations that are summarized below. These examples show how to Determine the statistical significance of differences in percent distributions; and Determine the statistical significance of other differences. Example Calculation 1 Determine if a year-to-year difference in an ACS percentage is sta- tistically significant. Problem The 2001 ACS for Bronx County, New York, estimates the number of women, aged 15 and over, to be 533,280, with lower and upper bounds of 533,062 and 533,498, respectively. The estimated number of these women who have never married is 213,545, with a lower bound of 208,349 and an upper bound of 218,741. In 2002, ACS estimated the number of women age 15 and over to be 538,338, with lower and upper bounds of 537,558 and 539,118, and the number of these women who have never married to be 220,675, with lower and upper bounds of 214,146 and 227,204. Did the percentage of women who have never been married increase significantly between the years? Relevant Equations Standard error = 90 percent confidence margin of error/1.65 Margin of error = max(upper bound estimate, estimate lower bound) 42 See www.census.gov/acs/www/Downloads/ACS/accuracy2002change.pdf.

OCR for page 65
Using ACS Data 67 Note: many, but not all, ACS intervals are symmetrical around the reported estimate, so choosing the maximum interval is the conservative approach to establishing the margin of error. Standard error of a proportion: 1 ^2 ^) = SE( p ^ ))2 - X (SE(Y (SE( X ^ ))2 ^ Y ^2 Y Note: this approximation is valid for proportions of two estimates where the numerator (X) is a subset of the denominator (Y). Difference ( ^Final - P DIFF = 100 P ^Initial ) Standard error of the difference: ^Final )]2 + [SE(P SE(DIFF ) = [SE(P ^Initial )]2 Margin of error of the difference: ME(DIFF) = 1.65 x SE(DIFF) Calculations Year 2001 SE(X^ ) = SE(213,545) = (218,741 - 213,545)/1.65 = 3,149 SE(Y^ ) = SE(533,280) = (533,498 - 533,280)/1.65 = 132 ^ ) = SE(0.400) = 1 213, 5452 SE( p (3,149)2 - (132)2 = 0.006 533, 280 533, 2802 Year 2002 SE(X^ ) = SE(220,675) = (227,204 - 220,675)/1.65 = 3,957 SE(Y^ ) = SE(538,338) = (539,118 - 538,338)/1.65 = 473 ^ ) = SE(0.410) = 1 220, 6752 SE( p (3, 957)2 - (473)2 = 0.007 538, 338 538, 3382 Comparison DIFF = 100 (0.410 - 0.400) = 1.0 percent SE(DIFF ) = [0.006]2 + [0.007]2 = 0.9 percent ME(DIFF) = 1.65 0.9 = 1.5 percent Lower bound = 1.0 percent 1.5 percent = -0.5 percent Upper bound = 1.0 percent + 1.5 percent = 2.5 percent Discussion Since the lower bound and upper bound have different signs, the year-to-year difference is not significant at the 90 percent confidence level. Example Calculation 2 Compare differences for other estimates.

OCR for page 65
68 A Guidebook for Using American Community Survey Data for Transportation Planning Problem The mean travel time to work for Bronx County in 2001 was 41.0 minutes, with an upper bound of 41.7 minutes. In 2002, the ACS mean travel time to work was 41.8 minutes with an upper bound of 42.8 minutes. Did the mean travel time to work change significantly between the years? Relevant Equations Means and other non-percentage ACS estimates are as follows: Difference: ( ^ Final - X DIFF = X ^ Initial ) Standard error of the difference: ^ Final )]2 + [SE( X SE(DIFF ) = [SE( X ^ Initial )]2 Margin of error of the difference: ME(DIFF) = 1.65 SE(DIFF) Calculations Year 2001 SE(X^ ) = SE(41.0) = (41.7 - 41.0)/1.65 = 0.4 Year 2002 SE(X^ ) = SE(41.8) = (42.8 - 41.8)/1.65 = 0.6 Comparison DIFF = (41.8 - 41.0) = 0.8 minutes SE(DIFF ) = [0.4]2 + [0.6]2 = 0.7 minutes ME(DIFF) = 1.65 0.7 = 1.2 minutes Lower bound = 0.8 minutes 1.2 minutes = -0.4 minutes Upper bound = 0.8 minutes + 1.2 minutes = 2.0 minutes Discussion Since the lower bound and upper bound have different signs, the year-to-year difference is not significant at the 90 percent confidence level. With the standard errors of 0.4 minutes for 2001 and 0.6 minutes for 2002, the difference in the mean travel time would have to had been more than 1.2 minutes for the difference to be statistically significant. Alternatively, if the 2002 standard error were 0.27 minutes, the difference of 0.8 minutes would have been statistically significant at the 90 percent confidence level: DIFF = (41.8 - 41.0) = 0.8 minutes SE(DIFF ) = [0.4]2 + [0.27]2 = 0.48 min ME(DIFF) = 1.65 0.48 = 0.79 min Lower bound = 0.8 minutes 0.79 minutes = 0.01 minutes Upper bound = 0.8 minutes + 0.79 minutes = 1.59 minutes. An analyst is not restricted to using the 90 percent confidence level even though the Census Bureau reports the data at this level. If one wanted to compare the mean travel times for the dif- ferent years using a confidence level of 80 percent, the calculations could be accomplished as shown below:

OCR for page 65
Using ACS Data 69 Year 2001 SE(X^ ) = SE(41.0) = (41.7 - 41.0)/1.65 = 0.4 Year 2002 SE(X^ ) = SE(41.8) = (42.8 - 41.8)/1.65 = 0.6 The calculation Census Bureau upper and lower bounds for values use a 90 percent confidence level. Thus, 1.65 is used as the denominator. DIFF = (41.8 - 41.0) = 0.8 minutes SE(DIFF ) = [0.4]2 + [0.6]2 = 0.7 minutes ME(DIFF) = 1.28 0.7 = 0.9 minutes For the comparison, the critical value associated with the 80 percent confidence level, 1.28, is used to calculate the margin of error of the difference. Table 4.10 showed factors associated with different confidence levels, and a statistics textbook would include others. Lower bound = 0.8 minutes 0.9 minutes = -0.1 minutes Upper bound = 0.8 minutes + 0.9 minutes = 1.7 minutes. Therefore, even at the 80 percent confidence level, the lower and upper bounds of the differ- ence are opposite signs indicating that the difference is not statistically significant. In practice, it is not likely that an analyst would be interested in making comparisons with confidence levels that are lower than the Census Bureau's 90 percent. It is more likely that if one were using a different confidence level, it would be the 95 percent confidence level (for which one would use a critical value factor of 1.96). 4.6.3 Multiyear Averaging/Analysis of Overlapping Averages The main advantage of moving averages, as compared to annual estimates, is that moving averages smooth the data and are thus more reliable (lower standard errors, less year-to-year variation). Since moving averages smooth out the random fluctuations in the data, they can provide a clearer visual picture of the overall trend in a certain variable of interest. The main disadvantage of moving averages is the lag time associated with them. If conditions are rela- tively stable across the years over which data are averaged, multiyear average estimates will be close to the annual estimates. However, if conditions change dramatically in a given year, the annual estimate reflects the change in a more timely manner than does the multiyear average. There are two issues to consider with regard to the use of ACS multiyear averages. The first issue is related to the comparison of two moving averages that include overlapping years. It is important to note that statistically valid annual estimates of change cannot be computed from the difference of two moving averages if the two moving averages are based on data from over- lapping years, such as from a moving average of years 1996-1998 and a moving average of years 1997-1999. This is because when standard statistical procedures are used to test for sig- nificant differences between estimates over time, it is assumed that the two estimates are drawn from independent samples. This assumption is violated in the case of the overlapping moving averages. One tempting way to look at the comparison of two consecutive overlapping moving averages, say 2003-2005 and 2004-2006, is that it is in essence a comparison of the difference between 2006

OCR for page 65
70 A Guidebook for Using American Community Survey Data for Transportation Planning (which is only in the second multiyear period) and 2003 (which is only in the first multiyear period). Unfortunately, the fact that the Census Bureau has released these data as multiyear averages is a recognition that a direct comparison between 2003 by itself and 2006 by itself for this geography is not valid because the individual year sample sizes will not support the comparison. When an ana- lyst uses the multiyear overlapping estimates to make conclusions about single years, he or she is, in effect, cheating by using an artificially high number of data records that include the overlapping years (2004 and 2005, in this case). The analyst is claiming the reduced sampling error that comes with more data records, but in reality, only a portion (a third, in this case) of each sample's records actually contribute to the comparison the analyst is making. This is not to say that one should not get a qualitative idea of the pattern of change from exam- ining these overlapping moving averages, especially as they accumulate over time. Such time series will be very informative to data users as they try to capture what is happening in a region over time. However, there is a need to be cautious about making definitive conclusions about the differences of the overlapping estimates. As the combination of multi- and single-year averages accumulate for geographic areas of dif- ferent sizes, it is likely that it will be common for transportation planners and other ACS data users to develop factoring methods and iterative proportional fitting methods that combine multiyear average estimates for smaller geographic areas and single-year estimates for corre- sponding larger geographic areas to synthesize single-year estimates for geographic areas that do not support this level of ACS reporting. For more homogenous areas and ACS characteristics, these methods will provide reasonable small area estimates. However, analysts will need to remember that the ACS sample sizes do not really support such analyses, and therefore any con- clusions drawn from these synthesized data are speculative. The second issue related to multiyear estimates is that moving averages also present problems when used as dependent variables in several statistical models (such as time series models) and regression models, since the statistical properties of the data (such as autocorrelations) would be affected by the moving averages. Users should understand the implicit statistical assumptions in their analyses and be sure that the ACS data comply with these assumptions. For instance, if an analyst wanted to test the effect of gasoline prices on commuting modes for a small area (requir- ing multiyear averaging), he or she will not be able to use monthly, or even annual, gas price data effectively. The analyst will need to develop estimates of the independent variable in the same timeframe during which the ACS data are available. 4.6.4 Seasonality Analysis Using ACS ACS data are collected throughout the year, as opposed to at a single point in time like the cen- sus Long Form data, so it will be important for data users to remember that analyses of other data in conjunction with ACS data will need to reflect the full year. Because seasonality is very interesting from a transportation planning perspective, as travel patterns can vary significantly throughout the year, U.S. DOT has sponsored an analysis of sea- sonality using Hampden County, Massachusetts, data. For this guidebook, seasonality in two other ACS test counties, Broward County, Florida, and Pima County, Arizona, were analyzed using the evaluation datasets provided by FHWA and the Census Bureau. These datasets also were used for the comparisons described in Appendix I. The seasonality analyses that were per- formed with these data relied on information about the quarter of the year in which the data were collected. Since quarter data generally will not be available to ACS users, the results of these analyses are included in Appendix J. The key lesson from the analysis is that for some locations, seasonality will have an important effect on ACS results but, unfortunately, without information

OCR for page 65
Using ACS Data 71 on the time of year that the responses were obtained, we will not have much opportunity to address the issue. 4.6.5 ACS Continuity An important concern about replacing the census Long Form with the continuous ACS is that by separating the sample data collection from the constitutionally mandated census count, ACS is more likely than the Long Form to be cut back or eliminated during the government's budg- eting process. Effect of Missing Data. Given the large standard errors of the ACS estimates, any further reduction in sample size will adversely impact the quality of its estimates, which will be reflected in larger standard errors. The relationship between the sampling rate and the resulting standard error of the estimates is shown by the following equation: ^ ( ) ^ 1 - Y ^ = SY SE Y N where S is the inverse of the sampling rate minus 1, Y^ is the estimate, N is the total count of people or housing units, and SE(Y ^ ) is the standard error of Y^. For example, if the sampling rate is cut by half, the resulting standard error is equal to S2 2 = 1.41 times the original standard error. S1 Sample size reduction due to potential budget cuts could have an effect on different phases of the ACS data collection program. The effect of eliminating the data collection, even for a single year, will be severe, and because of the multiyear averaging, will be long lasting. Effect of Making ACS Voluntary The Census Bureau evaluated the effects of making par- ticipation in ACS voluntary, rather than mandatory. In two data quality reports that analyzed this issue, the Census Bureau responded to questionnaire content and increasing public privacy concerns by evaluating the potential effects of having ACS implemented as a voluntary survey. These reports are Report 3: Testing the Use of Voluntary Methods and Report 11: Testing Voluntary Methods--Additional Results. To analyze the potential effects of this change, the Census Bureau performed a test using the March and April 2003 ACS sample. Four experimental mail treatments were used as follows: A mandatory treatment identical to the mandatory treatment that had been used previously, An alternative mandatory treatment that attempted to improve the user-friendliness of the mail survey, A standard voluntary approach similar to that used for other voluntary Census Bureau sur- veys, and A voluntary treatment that explicitly told respondents that the survey was voluntary. Voluntary methods also were applied to the telephone and in-person surveys. The responses to the different treatments were then compared with each other and with the year 2002 manda- tory treatment.

OCR for page 65
72 A Guidebook for Using American Community Survey Data for Transportation Planning Based on their analyses of the data, the analysts drew the following conclusions: A dramatic decrease (more than 20 percentage points) occurred in mail response when the standard survey was voluntary. The reliability of estimates was adversely impacted by the reduction in the total number of completed interviews--producing reliable results with voluntary methods would require an increased initial sample size. The decrease in cooperation across all three modes of data collection resulted in a notewor- thy, but not critical, drop in the weighted survey response rate. The estimated annual cost of implementing the ACS would increase by at least $59.2 million if the survey was voluntary and reliability was maintained. Levels of item non-response for the data collected under voluntary and mandatory methods were very similar. Although the differences in item non-response at the topic level were sta- tistically significant, the item non-response rates were very similar. The use of voluntary methods had a negative impact on traditionally low-response areas that will compromise our ability to produce reliable data for these areas and for small population groups such as blacks, Hispanics, Asians, American Indians, and Alaska Natives. The change to voluntary methods had the greatest impact on areas that have traditionally high levels of cooperation and on white and non-Hispanic households. Compared to a standard voluntary survey, the use of a more direct presentation of the volun- tary message 1) resulted in an additional decrease of four percentage points in mail response and 2) had only a minor additional impact on data quality, with an additional 1.6 percent decrease in the interview rate and an additional 0.4 percent decrease in the survey response rate. Compared to the current mandatory treatment, the revised mandatory treatment, which was intended to be more user-friendly, resulted in only a slight increase in mail cooperation (increase of 1.9 percentage points). Although the mail check-in rates were much higher for the mandatory treatments than for the voluntary treatments, the overall patterns of mail responses over time were remarkably simi- lar across all four treatments.