Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Once a data user has obtained the ACS data that are needed for the speciï¬c analysis, she or he will need to consider the special issues that affect the ACS data. The issues described in this sec- tion will affect how analyses are done and how analysis results are interpreted and reported. As discussed above, the Census Bureauâs migration to ACS adds some complexity to common data uses, but also introduces the ability to perform new and better data analyses. This section begins with a discussion of ACS data quality, focusing on non-sampling errors, bias, and other issues that could affect how well an ACS estimate reï¬ects the actual population. The issues identiï¬ed in this section can help data users understand their results better, and can help explain why unexpected results may be occurring. The issues discussed in Section 4.1 are related to the ACS data quality and accuracy. Sections 4.2 and 4.3 describe two Census Bureau data processing issues that affect how users will need to work with and interpret the ACS data. Section 4.2 considers the effects of the Census Bureauâs ACS data accumulation over time and geography, and the use of one-, three-, and ï¬ve-year aver- ages. Section 4.3 discusses potential data use and analysis challenges introduced by data disclo- sure limitations. These ï¬rst three sections outline many of the key issues that data users will need to be aware of to design and implement ACS analyses. The following three sections describe issues related to how analysts actually perform their analyses. Section 4.4 describes the need to consider the effects of sampling error on the ACS estimates. Section 4.5 describes the issues analysts will need to consider in comparing ACS results with Census 2000 results. Finally, Section 4.6 outlines the implications and opportunities of ACSâs frequent data releases. 4.1 Accuracy of ACS Data A key objective for the Census Bureau in migrating from decennial census Long Form data collection to the continuous data collection approach of ACS was to improve the quality of the data collected by improving the ways the data are collected and processed. To evaluate whether this objective is being achieved, the Census Bureau and other researchers have evaluated quality measures for the initial ACS effort and have compared the early ACS results to the decennial cen- sus Long Form. 4.1.1 Census Bureau Evaluation of ACS The Census Bureau has published 11 reports discussing ACS data quality issues based on the test site data and the C2SS experiment. The 11 Census Bureau reports are published under the title, âMeeting 21st Century Demographic Data NeedsâImplementing the American Community 31 C H A P T E R 4 Using ACS Data
Survey,â and are made available at www.census.gov/acs/www/AdvMeth/Reports.htm. The indi- vidual reports are â¢ Report 1: Demonstrating Operational Feasibility (issued July 2001); â¢ Report 2: Demonstrating Survey Quality (issued May 2002); â¢ Report 3: Testing the Use of Voluntary Methods (issued December 2003); â¢ Report 4: Comparing General Demographic and Housing Characteristics with Census 2000 (issued May 2004); â¢ Report 5: Comparing Economic Characteristics with Census 2000 (issued May 2004); â¢ Report 6: The 2001-2002 Operational Feasibility Report of the American Community Survey (issued May 2004); â¢ Report 7: Comparing Quality Measures: The American Community Surveyâs Three-Year Averages and Census 2000âs Long Form Sample Estimates (issued June 2004); â¢ Report 8: Comparison of the American Community Survey Three-Year Averages and the Census Sample for a Sample of Counties and Tracts (issued June 2004); â¢ Report 9: Comparing Social Characteristics with Census 2000 (issued June 2004); â¢ Report 10: Comparing Selected Physical and Financial Characteristics of Housing with the Census 2000 (issued July 2004); and â¢ Report 11: Testing Voluntary MethodsâAdditional Results (issued December 2004) These reports are summarized throughout the remainder of Section 4. Data Quality Measures Measuring how accurately a survey like ACS captures the attributes of the survey sample and the population from which it is drawn is very difï¬cult because in order to do so one would need to know the true characteristics of the population (in which case, the survey would not be a very useful effort). Therefore, survey researchers try to detect clues to potential problems in different survey components. In surveys, non-sampling error can result from a variety of problems, including â¢ Coverage errors, â¢ Reporting errors, â¢ Non-response error, and â¢ Processing and coding errors. As discussed below, some of these errors lend themselves to quantitative analyses, so that indicators can be used to assess the presence and degree of these non-sampling errors. Coverage Rates Survey coverage refers to how closely the sampling frame covers the target population. Coverage error occurs â¢ If housing units that belong to the target population are excluded (called under-coverage), â¢ If housing units that belong to the target population are counted more than once (over-cover- age), or â¢ If out-of-scope housing units (i.e., those not in the target population) are included in the sampling frame (over-coverage). The sample completeness rate indicates how well a target population is covered by a surveyâs sample population. This rate is calculated by dividing the surveyâs weighted population estimates, without non-response or coverage error adjustments, by the independently derived population estimates or counts. Unit Response Rates Unit response rates measure the degree of participation of sampled housing units in the survey. Non-response due to inability or unwillingness of housing units to participate can cause bias if the characteristics of non-respondents are different from those of respondents. 32 A Guidebook for Using American Community Survey Data for Transportation Planning
Item Non-Response Item non-response occurs when a given respondent does not provide answers for one or more items on the questionnaire. Robust methods for reducing item non- response were employed through different ACS phases. For mail responses, the automated clerical review and the follow-up operations contribute to reducing item non-response. During the CATI and CAPI procedures, the fact that a response is received to every question by the auto- mated instrument before the next question is asked reduces item non-response signiï¬cantly even when âdonât knowâ responses are allowed. After all data collection phases, items that were still missing were obtained by borrowing the data from respondents with similar characteristics, a process known as imputation or item allocation. ACS Data Quality The Census Bureauâs ï¬rst assessment of the potential data quality of ACS was the assessment of the accuracy and timeliness of the C2SS data that is reported in the second report of the Census Bureau evaluation series, available at http://www.census.gov/acs/www/ Downloads/Report02.pdf. In this report, Census Bureau experts and managers concluded When implemented, the ACS will improve survey quality compared to the decennial census Long Form. That is, some increase in sampling error will occur due to smaller sample sizes in any given year. However, timeliness will greatly improve, and non-sampling error should be reduced by the use of permanent, highly trained ï¬eld staff.13 The report evaluated C2SS on the basis of unit non-response, item non-response, sample completeness, control of processing/measurement errors, and sampling errors. Unit non-response rates for C2SS were found to be quite low (and lower than other Census Bureau surveys), but statistically signiï¬cant differences in the response rates were found between census tracts with different dominant racial/ethnic groups. Tracts with 75 percent or more of the population reporting a race or ethnicity of African American/black or American Indian/Alaskan Indian had statistically lower response rates than tracts that were similarly dominated by a pop- ulation reporting to be white. In terms of item non-response, the C2SS imputation rates for basic demographic items were signiï¬cantly lower than for the decennial census. Signiï¬cant differences in the imputation requirements were found for several key population variables, as shown in Table 4.1. The C2SS sample completeness was evaluated in relation to Census 2000, and was compared to the sample completeness ratio for the 1990 Census Long Form in relation to 1990 decennial counts (sample completeness measures for the year 2000 Long Form were not yet available at the writing of the report). The percent of the population represented in the C2SS sample was slightly higher than for the 1990 Long Form sample. Using ACS Data 33 13 U.S. Census Bureau, Meeting 21st Century Demographic Data Needs â Implementing the American Community Survey: Report 2: Demonstrating Survey Quality (May 2002), p. 7. Percent of Eligible Items Variable Census 2000 Imputation C2SS Imputation Relationship 2.2 % 1.5 % Gender 1.0 % 0.5 % Age 3.6 % 2.4 % Hispanic Origin 4.2 % 3.6 % Race 3.9 % 2.4 % Source: United States Census Bureau, 2002. Table 4.1. Comparison of C2SS and census 2000 population item imputation rates.
The researchers believe that through the ongoing monitoring of ACS quality measures, improvements in unit and item response rates can be realized in the future. In addition, improvements in the MAF that are underway and will continue as part of the ACS program will lead to improvements in sample completeness. It was not possible for the researchers to fully evaluate the potential processing and measure- ment errors. However, they do note several procedures that help to reduce these errors and have been implemented under the ACS quality assurance program. Sampling error was the one quality measure that the analysts determined would be adversely affected by ACS. Data users have concluded that the higher sampling error of ACS will have a signiï¬cant impact on the usefulness of the data. With a sampling rate of 3 million housing units per year, data accumulated over ï¬ve years will correspond to a sample size of less than three- fourths of the roughly 16.7 percent sampling rate achieved with the Long Form Survey. Considering the effect of sample size alone on the standard error of the estimates and assum- ing a constant sampling rate of 2.5 percent, the ACS estimates will have a standard error equal to 2.8 times, 1.6 times, and 1.25 times that of the Long Form for annual estimates, three- and ï¬ve-year moving averages, respectively.14 By examining the ACS test site data, the Census Bureau researchers drew the following conclusions: While the targeted levels of sampling error for single year estimates were met overall, differ- entials in levels of mail response for some population groups indicate that sampling error is dis- proportionately higher, suggesting the need for design changes.15 Even with improved survey follow-up procedures to address the problem of differential response to the initial mail surveys, the authors concluded that The ACS five-year averages are expected to have somewhat higher [relative standard error levels] than corresponding Census 2000 Long Form estimates... The premise of the ACS design is that this moderate increase in [standard errors] for a ï¬ve-year average is worthwhile in order to obtain regular updates of the estimates throughout the decade, and to obtain what is expected to be a generally lower level of non-sampling error.16 The best assessments of actual ACS (as opposed to C2SS) non-sampling error are the Census Bureauâs Accuracy of the Data reports, which are updated annually and available at www.census. gov/acs/www/UseData/Accuracy/Accuracy1.htm), and Report 7 of the Census Bureauâs ACS evaluation series, which is available at www.census.gov/acs/www/AdvMeth/acs_census/creports/ Report07.pdf. The evaluation report compares Census 2000 data quality measures to the same ACS (1999- 2001) data quality measures at the county and census tract level for the ACS test sites. To analyze the differences at smaller geographic breakdowns, census tracts within the ACS test sites were divided into ï¬ve groups: 1. County population less than 100,000; 2. County population between 100,000 and 1 million, with tract population less than 4,000; 3. County population between 100,000 and 1 million and tract population greater than 4,000; 34 A Guidebook for Using American Community Survey Data for Transportation Planning 14 Ronald Eash, Impacts of Sample Sizes in the ACS, presented at TRB Census Data for Transportation Planning: Planning for the Future Conference, May 12, 2005. 15 U.S. Census Bureau, Meeting 21st Century Demographic Data Needs â Implementing the American Community Survey: Report 2: Demonstrating Survey Quality (May 2002), p. 27. 16 U.S. Census Bureau, Meeting 21st Century Demographic Data Needs â Implementing the American Community Survey: Report 2: Demonstrating Survey Quality (May 2002), p. 29.
4. County population greater than 1 million and tract population less than 4,000; and 5. County population greater than 1 million and tract population greater than 4,000.17 Table 4.2 summarizes some of the key quality measures compared in the ACS and Census 2000 at the county level. The ï¬gures shown reï¬ect the Census Bureauâs weighted deï¬nitions of response and completion rates. Based on their evaluation of all of these items, the authors concluded The quality measures suggest that the ACS multiyear averages are at least as good as the esti- mates from the Long Form. When we also consider the enhanced timeliness of information from the ACS, the superiority of reengineering the 2010 Census over retaining traditional methods is clear.18 ACS Unit Response The self-response rate for Census 2000 was 68.1 percent while ACS was lower at 55.3 percent. This means that Census 2000 respondents were more likely to mail back their questionnaires than were ACS respondents. The authors note . . . the higher census Long Form self-response rates mean that the success of the census depended less on follow-up operations than did the success of the ACS. This was an expected resultâpast experience has consistently indicated that the census will produce mail return rates of between 10 to 20 percentage points higher than other similar operations, even decennial tests.19 The decennial census beneï¬ts from a large advertising and public relations campaign, and, therefore, has much higher visibility. The authors also point out, âCensus 2000 used questionnaires in languages other than English, especially in Spanish, which would have increased self-response rates in linguistically isolated areasâthe ACS used English questionnaires only.â Using ACS Data 35 17 Tracts with population less than 500 were discarded for this study; there are about 590 such tracts in the coun- try. The average tract population in the United States (65,000 tracts) is about 4,300. 18 U.S. Census Bureau, Meeting 21st Century Demographic Data Needs â Implementing the American Community Survey: Report 7: Comparing Quality Measures: The American Community Surveyâs Three-Year Averages and Cen- sus 2000âs Long Form Sample Estimates (June 2004), p. vii. 19 U.S. Census Bureau, Meeting 21st Century Demographic Data Needs â Implementing the American Community Survey: Report 7: Comparing Quality Measures: The American Community Surveyâs Three-Year Averages and Cen- sus 2000âs Long Form Sample Estimates (June 2004), p. 15. Characteristic Self-Response Rate Total Housing Unit Non-Response Occupied Housing Unit Non-Response Rate Allocation Rates Population Item Total Allocation Rates Occupied Housing Unit Total Allocation Rates Vacant Housing Unit Total Allocation Rates Population and Occupied Housing Unit Total Allocation Rates Sample Completeness Rates Housing Sample Completeness Household Population Sample Completeness ACS 55.3% 4.4% 5.2% 6.5% 7.7% 23.2% 6.9% 92.9% 90.4% Census 2000 68.1% 9.7% 8.7% 11.2% 15.8% 19.8% 12.8% 90.3% 91.1% Source: United States Census Bureau, 2004. Table 4.2. Comparison of quality measures at the county level.
Similar statistically different self-response rates were found for each tract group that was analyzed. Despite the lower initial return rate, the non-response rates for total housing units and occupied housing units were lower in the ACS than in Census 2000. At the tract level, ACS also consistently showed lower rates. ACS Sample Completeness Rates The sample completeness rate indicates how well a target population is covered by a surveyâs sample population. Rates greater than 100 would indicate over coverage of the population, and rates less than 100 would indicate under coverage. Both efforts failed to include the whole universe in their samples. The housing unit sample completeness rate for ACS was reported to be 92.9 percent compared to 90.3 percent for Census 2000, while the household population sample completeness rates were 90.4 percent and 91.1 percent respectively. ACS Item Response Rates and Item Allocation (Imputation) The reported total allocation rates in Table 4.2 are the weighted averages of the item allocation rates for the individual corre- sponding variables. Both Census 2000 and ACS allocate (impute) responses when items are left blank or responses are out of range. For all population items (54 responses) and occupied hous- ing unit items (29 responses), Census 2000 had higher allocation rates than ACS. Both for pop- ulation and occupied housing unit responses, the ACS allocation/imputation rate was about ï¬ve percent lower than the Census 2000 rate, with a similar trend across the ï¬ve tract groups. The differences in the vacant housing unit items (12 responses) are most likely the result of issues related to the comparability of the two estimates. The lower ACS imputation rates are a strong indication that the quality of ACS data compares favorably with census Long Form data. The lower (improved) levels of item non-response for ACS can be seen in Table 4.2 in the pre- vious section. While the reduced need for item allocation is very good news for data users, as noted in the previous section, the item allocation procedures used by the Census Bureau are still limited by the individual sequencing of these allocations. Although individual transportation-related items show reasonable allocation rates, many of the household items, when combined with person items, show unusual results. This is likely to result from the Census Bureauâs practice of processing the allocations of household items and person items separately without any cross-referencing. ACS Operational Quality Measures Reports 1 and 6 of the Census Bureauâs evaluation series (available at www.census.gov/acs/www/Downloads/Report01.pdf and www.census.gov/acs/ www/Downloads/Report06.pdf) reviewed the operational feasibility of ACS. In Report 1, Cen- sus Bureau staff reviewed the outcome of the C2SS and the 1999 and 2000 ACS test site deploy- ment to evaluate ACS from an operational standpoint. The key ï¬ndings of this effort were 1. Implementing the ACS should improve the year 2010 decennial census; and 2. The successful implementation of the C2SS during 2000 demonstrated that full implementa- tion of the ACS is operationally feasible. According to the report: By having only a Short Form in 2010, the Census Bureau can more sharply focus on its consti- tutional mandatesâto fully enumerate the population to apportion the House of Representatives. The ACS development programâsupported by a complete and accurate address systemâwill simplify the decennial design, resulting in improved coverage in 2010.20 The researchers also report that C2SS achieved the quality standards, budgets, and schedules that the Census Bureau had established. The C2SS effort came in slightly under budget, and most of the workload issues identiï¬ed with the effort were attributed to the fact that the C2SS was 36 A Guidebook for Using American Community Survey Data for Transportation Planning 20 U.S. Census Bureau, Meeting 21st Century Demographic Data Needs â Implementing the American Community Survey: Report 1: Demonstrating Operational Feasibility (July 2001), p. 7.
competing with the decennial census for Census Bureau resources. The weighted survey response rates for C2SS and the test counties (1999 and 2000 data) were quite highâabove 95 percent. Report 6 updates Report 1 by examining the ACS operations for 2001 and 2002. The authors state that their analyses, âprovide evidence of improved operational quality from the more than adequate levels achieved during the year 2000.â21 During the 2001-2002 period, schedules were maintained and workload levels were close to predicted. The workload issues noted in the ï¬rst report were resolved because there was no conï¬ict with decennial census activities. Response rates were maintained or improved, and the quality control measures implemented by ACS managers appeared to be effective. 4.2 Data Accumulation over Time and Geography The Census Bureau aggregates ACS data for small geographic units over multiple years before releasing the data to the public. This is done to improve the reliability of the data reported for small geographic levels, where the smaller annual sample sizes are associated with large standard errors. 4.2.1 Census Bureau Multiple-Year Estimation Once the ACS program is fully implemented for Census-deï¬ned areas with population under 20,000, ï¬ve-year moving averages will be released. For census areas with population between 20,000 and 65,000, both three- and ï¬ve-year moving averages will be released. For areas with population greater than 65,000, annual estimates, three-year moving averages, and ï¬ve-year moving averages will be released. Table 4.3 shows the percentage of counties and census places in each of the population cate- gories based on Year 2004 Census Bureau population estimates. If ï¬ve years of fully implemented ACS data were available for 2004 (2000-2004), ACS annual data would be provided for 24 percent Using ACS Data 37 21 U.S. Census Bureau, Meeting 21st Century Demographic Data NeedsâImplementing the American Community Sur- vey: Report 6: The 2001-2002 Operational Feasibility Report of the American Community Survey (May 2004), p. iv. Measure Total U.S. Population (2004) All Counties/Places Number of Areas Population More than 65,000 Population Number of Areas Population 20,000 to 65,000 Population Number of Areas Population Less than 20,000 Population Number of Areas Population Population Outside Areas of this Type United States 297,550,029 Counties 3,219 297,550,029 (100%) 780 (24%) 244,171,662 (82%) 1,096 (34%) 40,066,827 (13%) 1,343 (42%) 13,311,770 (4%) 0 Census Places 19,465 182,048,887 (61%) 457 (2%) 95,491,838 (32%) 1,168 (6%) 41,336,894 (14%) 17,840 (92%) 45,220,155 (15%) 115,501,372 (39%) Source: U.S. Census Bureau Population Estimates Program, 2004. Table 4.3. Percentage of counties and census places in ACS population ranges, 2004.
of the counties (with those counties comprising 82 percent of the U.S. population). Three-year average data (2002-2004) would be available for 34 percent of the counties (comprising 13 percent of the U.S. population). The remaining 42 percent of counties (4 percent of the population) would have ï¬ve-year average data reported. For the smaller census geographic areas shown in the tableâcensus placesâa much larger percentage (92 percent of census places, with 15 percent of the U.S. population) will have only ï¬ve-year average data available. An additional 39 percent of the U.S. population does not live in a census-deï¬ned place. Figures 4.1 and 4.2 illustrate the availability of single-year ACS estimates for 2004. They show the Minnesota counties and census places for which 2004 ACS data are available. Over time, ACS coverage will improve, but these ï¬gures demonstrate that the initial ACS data will have limited scope. Table 4.4 shows the Census Bureauâs current estimates of the number of areas for which single and multiyear ACS data will be available for the Census Bureauâs main geographic summary levels. The Census Bureau estimates that they will provide single-year ACS estimates for 761 counties. They will produce three-year estimates for those 761 counties, plus another 1,050 coun- ties with populations between 20,000 and 65,000. The remaining 1,330 counties with populations less than 20,000 will have ï¬ve-year estimates only. When the geographic areas of interest are census tracts, census block groups, or census TAZs, all ACS data will be reported as ï¬ve-year averages. The combination of data over successive years represents a tradeoff by the Census Bureau, in which the sampling error of the estimates is reduced through the inclusion of greater amounts of data (for multiple years) and data for more current years and with more frequency are made available. This, however, is at the expense of increasing the potential for problems with the inter- 38 A Guidebook for Using American Community Survey Data for Transportation Planning Figure 4.1. Minnesota counties with published 2004 ACS data.
Using ACS Data 39 Figure 4.2. Minnesota census places with published 2004 ACS data. Number of Geographic Areas Geography Nation Census Regions Census Divisions States Counties Minor Civil Divisions Places American Indian and Alaska Native Areas Metropolitan, Micropolitan, and Consolidated Statistical Areas Congressional Districts School Districts Census Tracts Block Groups Three-Year Estimates Population of 20,000 or More 1 4 9 51 1,811 592 1,983 41 905 436 3,290 - - Single Year Estimates Population of 65,000 or More 1 4 9 51 761 97 476 15 561 436 879 - - Five-Year Estimates 1 4 9 51 3,141 16,536 25,161 768 923 436 14,505 65,443 208,790 Source: United States Census Bureau, Design and Methodology: American Community Survey, Technical Paper 67 (May 2006) U.S. Government Printing Office, Washington, D.C. Table 4.4. ACS reporting for census geographic areas, 2005.
pretation of the estimates that span across the years. For stable, slowly changing small geographic areas and variables that do not vary signiï¬cantly from year to year, combining multiple succes- sive years of data is not likely to be much of a problem for most analyses. However, for variables that do change signiï¬cantly and for areas that experience large changes over the years, the inter- pretation of average results will be very difï¬cult. 4.2.2 Multiyear Estimation Procedures22 When multiyear estimates are developed, the most recent yearâs geography is used. From time to time, census place and county subdivision deï¬nitions change to reï¬ect political boundaries and new development. The multiyear estimates treat all records as though they were in the most recent yearâs geography, whether or not they actually were in previous years. This means where census geographic changes occur, inconsistencies within ACS estimates from year to year and across adjacent geographic areas will be present. All ACS dollar value estimates are inï¬ation adjusted to the most recent year of the three- or ï¬ve-year period (using yearly midpoint CAPI estimates). Similarly, if census variable categories change, the multiyear data will be presented only for the deï¬nitions being used in the most cur- rent year. The Census Bureau develops single-year estimates based on the combination of all 12 months of data collected for that year, without regard to the speciï¬c month in which the data are collected. Each yearâs estimates are controlled to that yearâs county-level annual population estimates (reï¬ecting population as of July 1 of the year). The one-year ACS estimates and percentages are developed by summing the weighted responses and dividing that sum by the weighted sum of the relevant population. For example, a single-year estimate for the percent of workers who carpool to work would be calculated as follows: Census Bureau estimates of medians for a single year are developed by analyzing the weighted data for the full year and identifying the median point of the estimate. Initially, the Census Bureau generated multiyear estimates by computing an average based on each yearâs individual estimates, so a three-year average estimate for the percent of workers who carpool to work would be computed as the sum of the individual yearly estimates divided by the sum of the individual year totals. However, for full implementation of ACS, the annual samples corresponding to the estimation period will be combined together and the estimates will be developed as they are for the single-year estimates with the control totals being equal to the aver- age of the component year controls. Multiyear median estimates are produced by combining data records from all years, rather than by simply averaging each yearâs median. An implication of the multiyear calculations is that three- and ï¬ve-year estimates may not appear completely consistent at ï¬rst glance with the single-year estimates for the same geography over those three- or ï¬ve-year periods. Analysts will need to be careful in comparing estimates for areas of different sizes and should carefully consider their analytical needs when deciding which available estimates to use. Percent Who Carpool In Year 1 Number Who = =p1 Carpool In Year 1 Total Number of Workers In Year 1 = N T 1 1 40 A Guidebook for Using American Community Survey Data for Transportation Planning 22 U.S. Census Bureau, Design and Methodology: American Community Survey, Technical Paper 67 (May 2006) U.S. Government Printing Ofï¬ce, Washington, D.C.
Suppose, for example, an analyst was interested in understanding and reporting on a partic- ular variable in ACS, such as the percentage of workers reporting that they travel more than one hour to their workplaces for a hypothetical geographic areaâa county consisting of a moderate- sized city and two small towns. Collectively, the geography has a population of more than 65,000, so annual ACS estimates, three- and ï¬ve-year estimates will be available for the full geo- graphic area. For the ï¬rst year of the analysis, 2010, the city population is about 60,000, so three- and ï¬ve-year ACS estimates will be available, and the two towns both have populations below 20,000, so only ï¬ve-year estimates are available. Table 4.5 shows hypothetical ACS data and reported estimates for the county and its three county subdivision components for several years. The top portion of the table shows the full set of estimates from a hypothetical ACS. However, not all of these results are made available to data users. The second portion of the table shows the Annual ACS estimates that would be made avail- able for the county. County-level population is available annually for all counties from the Census Bureau Population Estimates Program. In addition, because the county population is more than 65,000, the annual ACS estimates, including those for workers and for workers com- muting more than 60 minutes, are reported for the county. Over time, the population of the city grows to be more than 65,000 as well, so for the last few years shown in the table, annual estimates become available for the city. Unlike the county popu- lation, the city population is derived from the ACS data collectionâthe county population is used as a control total and the ACS data provide an estimate of the county population living in the city. The annual estimates for the cityâs workers and workers commuting more than 60 minutes are determined in the same way as the county annual estimates. The third part of the table, shows the hypothetical three-year average data release. Three-year average estimates are available for the county and the city beginning in 2007, and for one of the towns beginning in 2014. The county population estimates are the same as the annual estimates as they are not derived from the ACS and are used as controls, but all the other three-year aver- age estimates are calculated as described above. Because these estimates are three-year averages, the estimates vary both from the published annual estimates and from unpublished actual data. The ï¬nal part of the table shows the ï¬ve-year average estimates. Beginning in 2010, the ï¬ve-year average estimates would be available for the county, the city, and both towns. Like the three-year averages, these estimates are derived by combining data from the ï¬ve previous years (three previ- ous years for the three-year averages) and controlling the totals to the average of the countyâs population estimates for the ï¬ve years. In 2010, the analyst has estimates of workers commuting more than 60 minutes of: â¢ 8,789 for the county, based on the countywide annual estimate for 2009; â¢ 8,826 for the county, based on the countywide three-year average estimate ending in 2009; â¢ 8,749 for the county, based on the countywide ï¬ve-year average estimate ending in 2009; â¢ 5,677 for the city, based on the city three-year average estimate ending in 2009; â¢ 5,759 for the city, based on the city ï¬ve-year average estimate ending in 2009; â¢ 1,657 for one of the towns, based on the ï¬ve-year average estimate ending in 2009; and â¢ 1,333 for the other town, based on the ï¬ve-year average estimate ending in 2009. The choice of how to proceed with these various estimates is the analystâs. If the analyst needs to only look at one geography (say, he or she would like to know the number of long-distance commuters at the county level), then using the annual estimate would seem an attractive choice. At the county level, the annual estimate provides the most timely estimate and relies the least on averaging. Similarly, at the city level, the three-year average would likely be more attractive for Using ACS Data 41
Estimates from ACS Collected Data Population Age 16+ Workers Percent of Workers Commuting More Than 60 Minutes Workers Commuting More than 60 Minutes Year Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County 2005 54,104 12,717 13,025 79,846 22,183 5,214 5,340 32,737 0.24 0.23 0.22 0.24 5,324 1,199 1,175 7,698 2006 55,186 13,607 13,416 82,209 22,074 5,715 5,366 33,155 0.24 0.24 0.22 0.24 5,298 1,372 1,181 7,851 2007 56,290 14,560 13,818 84,668 23,079 6,261 5,665 35,005 0.24 0.25 0.22 0.24 5,539 1,565 1,246 8,350 2008 57,416 15,579 14,233 87,227 22,966 6,699 5,693 35,358 0.24 0.26 0.23 0.24 5,512 1,742 1,309 8,563 2009 58,564 16,669 14,660 89,893 22,840 7,168 5,717 35,725 0.24 0.27 0.24 0.25 5,482 1,935 1,372 8,789 2010 59,735 17,836 15,100 92,671 23,894 7,848 6,040 37,782 0.24 0.28 0.25 0.25 5,735 2,197 1,510 9,442 2011 62,722 19,085 15,553 97,359 25,089 8,206 6,221 39,516 0.24 0.29 0.26 0.25 6,021 2,380 1,617 10,018 2012 65,858 19,466 16,019 101,344 27,002 8,565 6,568 42,135 0.25 0.30 0.27 0.26 6,751 2,570 1,773 11,094 2013 69,151 19,856 16,500 105,506 27,660 8,538 6,600 42,798 0.26 0.31 0.28 0.27 7,192 2,647 1,848 11,687 2014 72,608 20,253 16,995 109,856 29,769 8,709 6,968 45,446 0.27 0.32 0.29 0.28 8,038 2,787 2,021 12,846 ACS Annual Data Release Population Age 16+ Workers Percent of Workers Commuting More Than 60 Minutes Workers Commuting More than 60 Minutes Year Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County 2005 n n n 79,846 n n n 32,737 n n n 0.24 n n n 7,698 2006 n n n 82,209 n n n 33,155 n n n 0.24 n n n 7,851 2007 n n n 84,668 n n n 35,005 n n n 0.24 n n n 8,350 2008 n n n 87,227 n n n 35,358 n n n 0.24 n n n 8,563 2009 n n n 89,893 n n n 35,725 n n n 0.25 n n n 8,789 2010 n n n 92,671 n n n 37,782 n n n 0.25 n n n 9,442 2011 n n n 97,359 n n n 39,516 n n n 0.25 n n n 10,018 2012 65,858 n n 101,344 27,002 n n 42,135 0.25 n n 0.26 6,751 n n 11,094 2013 69,151 n n 105,506 27,660 n n 42,798 0.26 n n 0.27 7,192 n n 11,687 2014 72,608 n n 109,856 29,769 n n 45,446 0.27 n n 0.28 8,038 n n 12,846 Table 4.5. Hypothetical Data Releases for a County and Its City and Towns.
Table 4.5. (Continued). ACS Three-Year Average Data Release Population Age 16+ Workers Percent of Workers Commuting More Than 60 Minutes Workers Commuting More than 60 Minutes Year Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County 2005 -- -- -- 79,846 - â - â - â - â -- -- -- -- - â - â - â - â 2006 -- -- -- 82,209 - â - â - â - â -- -- -- -- - â - â - â - â 2007 56,822 n n 84,668 23,108 n n 34,625 0.24 n n 0.24 5,546 n n 8,201 2008 57,976 n n 87,227 23,383 n n 35,535 0.24 n n 0.24 5,612 n n 8,501 2009 59,154 n n 89,893 23,654 n n 36,429 0.24 n n 0.24 5,677 n n 8,826 2010 60,356 n n 92,671 23,941 n n 37,394 0.24 n n 0.25 5,746 n n 9,203 2011 62,960 n n 97,359 24,980 n n 39,310 0.24 n n 0.25 5,995 n n 9,825 2012 65,498 n n 101,344 26,428 n n 41,540 0.24 n n 0.26 6,437 n n 10,627 2013 68,577 n n 105,506 27,659 n n 43,162 0.25 n n 0.26 6,924 n n 11,376 2014 72,016 20,665 n 109,856 29,286 8,953 n 45,225 0.26 0.31 n 0.27 7,624 2,776 n 12,358 ACS Five-Year Average Data Release Population Age 16+ Workers Percent of Workers Commuting More Than 60 Minutes Workers Commuting More than 60 Minutes Year Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County Acity Bee Town Cee Ville Alpha County 2005 -- -- -- 79,846 - â - â - â - â -- -- -- -- - â - â - â - â 2006 -- -- -- 82,209 - â - â - â - â -- -- -- -- - â - â - â - â 2007 -- -- -- 84,668 - â - â - â - â -- -- -- -- - â - â - â - â 2008 -- -- -- 87,227 - â - â - â - â -- -- -- -- - â - â - â - â 2009 59,716 15,511 14,666 89,893 23,996 6,587 5,892 36,475 0.24 0.25 0.23 0.24 5,759 1,657 1,333 8,749 2010 60,948 16,607 15,116 92,671 24,374 7,150 6,044 37,569 0.24 0.26 0.23 0.24 5,850 1,870 1,404 9,125 2011 63,509 18,042 15,808 97,359 25,399 7,797 6,321 39,517 0.24 0.27 0.24 0.25 6,096 2,116 1,520 9,732 2012 65,824 19,173 16,346 101,344 26,345 8,325 6,541 41,212 0.24 0.28 0.25 0.25 6,381 2,341 1,640 10,363 2013 68,498 20,139 16,870 105,506 27,415 8,741 6,751 42,906 0.25 0.29 0.26 0.26 6,758 2,542 1,760 11,061 2014 71,557 20,920 17,379 109,856 28,923 9,076 7,023 45,023 0.25 0.30 0.27 0.27 7,314 2,727 1,901 11,942
many analyses, because the data are more relevant to the current period than the ï¬ve-year estimate. For the towns, the ï¬ve-year average estimates are the only available choice. There may be some instances where analysts would be willing to sacriï¬ce currency of the esti- mates for the greater precision offered from the multiyear averageâs larger sample sizes and for the lesser volatility in the average estimates. The three- and ï¬ve-year estimates tend to dampen the effect of year-to-year changes, so using the averages can help analysts avoid worrying about what may simply amount to random year-to-year noise. Unfortunately, on the other hand, the aver- aged estimates do not pick up real trends as strongly or as quickly as do single-year estimates. As can be seen by comparing the hypothetical data releases for the commuting issue at the town level to the top part of the table, the multiyear averages lag behind in identifying the increase trend. Decisions about which estimates to use become more complicated when the analyst needs to examine the variable across different geographic levels. Making comparisons between one-year estimates and multiyear estimates will be problematic if the variable of interest is trending one way or another during the multiyear period or if the variable in the most current year is different than for the previous years. Suppose the analyst wanted to know the percentage of long-distance commuters in the county that lived in one of the smaller towns. Dividing the ï¬ve-year average estimates for the towns by the single-year estimate for the county is not really a valid approach, since the two measure the vari- able over different periods. The more appealing approach would be to use the ï¬ve-year averages for both the towns and the county for this analysis. This âleast common denominatorâ approach ensures that any regional changes during the averaging period are captured in all of the estimates. Typically, as a practical point and as in this example, for most variables and geographic areas, the differences between the estimates from the different averages will not be so large that they would materially affect policy decisions, but one could easily think of examples in rapidly chang- ing areas where differences in the single-year, three- and ï¬ve-year averages, could affect the results of analyses in meaningful ways. To summarize the elements of choosing the particular ACS estimate to use in analyses, data users should consider the following: â¢ Is the anticipated analysis related to understanding the most recent conditions and identify- ing potential recent shifts in the population? â¢ To what level does the analysis need to be protected from potential random year-to-year noise in the estimates? â¢ Have there been any signiï¬cant regional changes in the past few years that might make estimates that include both pre- and post-change ACS data less useful? â¢ Will the analyses involve multiple geographic levels for which the same types of ACS estimates might not be available? 4.3 Data Disclosure Limitations As noted in Section 2, before releasing any ACS data, the Census Bureau ï¬rst edits the data- base to ensure it is within compliance with disclosure rules. The Census Bureauâs DRB governs the release of census data as described below: Title 13 of the United States Code authorizes the Census Bureau to conduct censuses and sur- veys. Section 9 of the same Title requires that any information collected from the public under the authority of Title 13 be maintained as conï¬dential. . . . The Census Bureauâs internal Disclosure Review Board (DRB) sets the conï¬dentiality rules for all data releases.23 44 A Guidebook for Using American Community Survey Data for Transportation Planning 23 See www.census.gov/eos/www/sestats.html.
4.3.1 Data Disclosure Avoidance Three types of data disclosure avoidance procedures are expected to be applied to the ACS data with varying effects on data utility: imputation, rounding, and data suppression. Imputation The confidentiality edit is implemented by selecting a small subset of indi- vidual households from the internal sample data files and blanking a subset of the data items on these household records. Responses to those data items are then imputed using the same imputation procedures used for non-response. A larger subset of households is selected for the confidentiality edit for small areas to provide greater protection for these areas. The edit- ing process is implemented in such a way that the quality and usefulness of the data are preserved.24 Rounding For the most common decennial census sample data products, a small amount of uncertainty was introduced into the estimates of census characteristics. The sample itself provided adequate protection for most areas for which sample data are published since the resulting data are estimates of the actual counts; however, small areas required more protection. For CTPP 2000 and other similar projects for which detailed cross-tabulation data for small geographic areas are reported, the Census Bureau enhances conï¬dentiality further by rounding the reported estimates and by establishing minimum response thresholds. The DRB issued a memorandum on December 11, 2001 stating the following rules: For Part 1 data (place of residence), and Part 2 data (place of work), all published values will be rounded as follows: â¢ Zero rounds to zero, â¢ One through seven rounds to four, and â¢ All other numbers round to the nearest multiple of ï¬ve. Numbers ending in zero and ï¬ve are not rounded. For Part 3 (journey-to-work ï¬ows), the DRB allows two tables to be published with no record threshold. These include the following: â¢ Table 3.1 (Total Workers); and â¢ Table 3.2 (Vehicles Available by Means of Transportation to Work). Added to this set were tables of aggregates, means, and medians that were rounded according to speciï¬cations used for all sample data products. For all other Part 3 tables (Tables 3.3 through Table 3.7 in CTPP 2000), the DRB set a three-unweighted record threshold. A key issue to note in the analyzing effects of disclosure rules is the application of independ- ent rounding. Total columns in each table may not match the sum of the categories because totals are rounded independently of the cells, as shown in the rounding example in Table 4.6. However, because some variables (e.g., travel mode to work) are classiï¬ed in more than one way depending on which table one uses, the number of possible answers is higher than just two. Up to 15 estimates for the number of transit commuters may be possible25 when different tables and different geographies are analyzed, as shown in the example in Table 4.7. While there are very few one-dimensional tables in CTPP 2000, having one-dimensional unrounded tables in ACS would be of great use in establishing control totals as checks for ana- lysts and as inputs to iterative proportional ï¬tting processes. Using ACS Data 45 24 See www.census.gov/td/stf3/append_c.html. 25 Chuck Purvis, Metropolitan Transportation Commission, Oakland, California, e-mail posted to the CTPP news listserve on February 19, 2004.
A comparison of CTPP 2000 data and Summary File 3 (SF 3)26 data shows that the CTPP esti- mates were more likely to be lower than the SF 3 values.27 The effect of rounding a value from one through seven to a value of four generally provided a lower estimate than the actual value. Rounding as conducted for CTPP 2000 does not affect the statistical signiï¬cance of the data. However, it does cause a number of distortions while aggregating geography. It is important to minimize the number of geographies that are combined. For example, if the CBD can be deï¬ned using tract geography, tracts should be used rather than the more ï¬nely deï¬ned TAZs. For both the Census 2000 and ACS datasets supplied by the Census Bureau for the research discussed in Appendix I and Appendix J, the data were subject to further disclosure scrutiny.28 All estimates were rounded by intervals of 10, rather than intervals of 5. The new rounding rules applied to the ACS research data compound the problems seen for CTPP 2000 and resulted in a signiï¬cant loss of journey-to-work trip ï¬ow data. Data Suppression In addition, for a CTPP-like product from the ACS, the Census Bureau would establish minimum response thresholds for some of the ï¬ow tables. Although rounding is a signiï¬cant data issue, applying thresholds to journey-to-work ï¬ow data will almost certainly eliminate journey-to-work ï¬ows for small geography. Table 4.8 shows pairs of geographies 46 A Guidebook for Using American Community Survey Data for Transportation Planning 26 Summary File 3 consists of 813 detailed tables of Census 2000 social, economic, and housing characteristics compiled from a sample of approximately 19 million housing units (about 1 in 6 households) that received the Census 2000 Long Form questionnaire. 27 Nanda Srinivasan, Cambridge Systematics Inc., âData Rounding in CTPP 2000â CTPP Status Report, April 2004. 28 Correspondence with Phillip Salopek, U.S. Census Bureau, on July 23, 2004. Sample Table Data Rounded Value Using Rounding Rules 0 vehicle households = 6 4 1 vehicle households = 14 15 2 vehicle households = 8 10 3 vehicle households = 8 10 4 vehicle households = 3 4 Incorrect Total Rounded Value = 4+15+10+10+4=43, which is rounded to 45. Correct Total Rounded Value = 6+14+8+8+3 = 39, which is rounded to 40. Source: U.S. Census Bureau CTPP 2000 data. Table 4.6. CTPPâs independent rounding of table cells and totals. Summary Level Category TAZ Block Group Tract County MPO Number of Levels 4,031 4,384 1,403 9 1 Table 2-2 Transit = 5 categories 319,345 319,433 319,717 320,116 320,125 Table 2-12 Transit = 3 categories 319,553 319,521 319,780 320,129 320,120 Table 2-27 Transit = 2 categories 319,600 319,541 319,836 320,125 320,120 Source: U.S. Census Bureau CTPP 2000 data. Table 4.7. CTPP rounding: estimates of transit commuters for different geographic summary levels and CTPP Tables.
tabulated with and without the disclosure rules for Hampden County, Massachusetts, for both the ACS and Census 2000 at the census tract level. It can be seen that without thresholds, and allowing for a 15 percent sampling rate, the ACS data produces about three-fourths (6,368/8,228) of the number of origin-destination pairs pro- duced by the Long Form. The new rounding rules would affect ACS data more signiï¬cantly, with only 90 percent (181,563/202,024) of total workers being reported due to rounding. For tables subject to thresholds, applying the same rules to both the ACS and the Census 2000 data, the number of pairs in ACS is still about 75 to 80 percent (1,673/2,644) of the origin-destination pairs shown in the census Long Form. However, the number of workers in ACS drops further down to about 60 percent (118,234/202,024) of the total workers in Hampden County. There is a growing concern within the U.S. transportation community that the Census Bureau will continue to use the same rounding and threshold rules for all future origin-destination tables produced from ACS. The effect of these rules would be signiï¬cant at the census tract levelâin the example, producing 1,673 origin-destination pairs when the census Long Form (without disclo- sure rules) would have produced 8,228 origin-destination pairs, accounting for only 60 percent of workers living in the county. It also is expected that a similar or even more severe loss of ï¬ow data will occur at the TAZ level. One implication of this is that transportation analysts might have to resort to aggregating their TAZs into larger geographies for which sufï¬cient ï¬ow data are available. Some researchers have pointed out that if blocks are aggregated so that they are larger than walking distance to a bus stop, for example, then the geography aggregation causes the survey data to become of less value for bus route planning. Similarly, if the aggregate geography is larger than the distance between highway exits, then the survey data cannot be used for highway cor- ridor analysis. Because the ACS is sampled over time, and at any point in time has less sample size than Cen- sus 2000, it may be desirable to have less stringent disclosure rules for ACS. The United King- dom (UK), for example, has a higher sampling rate and the data are subject to fewer disclosure rules. The census in the UK is conducted solely via the Long Form with about 24 million house- holds surveyedâalmost equivalent to the Census 2000 Long Form in the United States. The suite of standard data products from the UK is quite extensive.29 Table Variable Collapsing As for the 2000 CTPP, the Census Bureau expects to apply qual- ity control measures on ACS data products for geographic areas for which categorized tables could be misinterpreted. Any table whose median distribution of covariance for individual cell values is greater than 61 percent will be modiï¬ed or suppressed for that geography. For example, for a given geography for County X, for a table with 18 means of transportation, individual covariances are calculated for estimates of workers who drove alone, carpooled, etc. If the median of these covariances is Using ACS Data 47 29 See:www.statistics.gov.uk/census2001/. Part 3: Without Thresholds Part 3: With Thresholds Data Census 2000 ACS Total Geographic Pairs with Reported Work Flows 8,228 6,368 Total Workers with Reported Work Flows 207,120 181,563 Total Geographic Pairs with Reported Work Flows 2,644 1,673 Total Workers with Reported Work Flows 147,080 118,234 Part 1 Total Workers 199,220 202,024 Source: FHWA CTPP Status Report, April 2004. Table 4.8. Disclosure effect on Census 2000 versus ACS, Hampden County.
greater than 61 percent, then these modes would be collapsed to fewer categories according to predeï¬ned collapsed table deï¬nitions. If the median of covariances for the collapsed table still exceeds 61 percent, the table will be suppressed for County X. 4.4 Understanding, Working with, and Reporting Sample Data 4.4.1 ACS Sample Size The ACS questionnaire is sent to 250,000 housing units every month, or equivalently to 3 million housing units annually, drawn from all counties in the U.S. To allow data users to better analyze smaller areas, the Census Bureau applies differential sampling rates based on the area type. The 2005 sampling rates are shown in Table 4.9. In contrast, the decennial census Long Form was sent to about one of every six addresses. Since both the Long Form and ACS data represent samples of the overall population, they include some imprecision, or margin of error, in their estimates. 4.4.2 Sampling Error Sampling error is the term given to the error associated with deriving an estimate from a sam- ple rather than an entire population. ACS data are estimates of actual numbers or percentages in the population but because the data are not collected from the whole population, random sampling error will be present. The larger the sample size is, the smaller the sampling error will be but, of course, the speciï¬c amount of error in an estimate can only be known if information from the true population were available. Sampling error is most commonly estimated through the calculation of the standard error associated with the estimate. Standard error is a measure of the deviation of a sample estimate from the average of all possible similar samples. It is an indication of the precision with which a 48 A Guidebook for Using American Community Survey Data for Transportation Planning Area Type Sampling Rate Category Blocks in smallest sampling entities (estimated occupied housing units in block < 200) Blocks in smaller sampling entities (estimated occupied housing units in block â¥ 200 and < 800) Blocks in small sampling entities (estimated occupied housing units in block â¥ 800 and â¤1200) Blocks in large tracts (estimated occupied housing units in block > 1200 and estimated occupied housing units in tract > 2000) â Mailable addresses â¥ 75% and predicted levels of completed interviews prior to subsampling > 60% â Mailable addresses < 75% and/or predicted levels of completed interviews prior to subsampling â¤ 60% All other blocks (estimated occupied housing units in block > 1200 and estimated occupied housing units in tract â¤ 2000) â Mailable addresses â¥ 75% and predicted levels of completed interviews prior to subsampling > 60% â Mailable addresses < 75% and/or predicted levels of completed interviews prior to subsampling â¤ 60% 2005 Final Sampling Rate 10.0% 6.9% 3.6% 1.6% 1.7% 2.1% 2.3% Source: United States Census Bureau, Design and Methodology: American Community Survey, Technical Paper 67 (May 2006) U.S. Government Printing Office, Washington, D.C. Table 4.9. ACS Sampling rates, 2005.
sample estimate approximates the population value. Formulas for calculating standard errors associated with sample estimates are straightforward, but since the Census Bureau will calculate and report the standard errors, the reader is referred to any standard statistics textbook for more details on these calculations. The sampling error of an estimate is usually summarized as a combination of a conï¬dence level and a conï¬dence interval. The conï¬dence level is the percentage of times that drawing a sample of a particular size from a certain population will result in having the actual (but unknown) parameter of interest being within a certain conï¬dence interval. For instance, a surveyor might report that based on survey results, sample size, and variance levels, the percent of households with zero vehicles for a certain population of households is 10 percent plus or minus 3 percent at the 95 percent conï¬dence level. This means that 95 out of 100 times that we performed a survey with the same sample size, the estimate we determine in the surveyâplus or minus 3 percentâwill include the true percentage of zero vehicle households. For this example, the conï¬dence level is 95 percent. The conï¬dence interval is 6 percent and the margin of error is Â±3 percent. It is common for analysts to establish a conï¬dence level for reporting and then to calculate the margin of error for the survey-derived estimates associated with that conï¬dence level. The conï¬- dence levels selected are generally related to how much uncertainty researchers are able to accept in particular estimates. Medical and scientiï¬c researchers sometimes will specify 99 percent conï¬dence levels or higher. Political polls seem to usually report margins of error assuming conï¬dence levels of 95 percent or 90 percent. For a particular sample population and sample size, as conï¬dence levels are increased, the corresponding margins of error around the sample estimates widen. Suppose a sample parameter is measured from a large sample to have a mean value of X and, based on the variation in the sample, the standard error is computed to be Y. The conï¬dence intervals for different conï¬dence levels are shown in Table 4.10. Both the decennial census Long Form and ACS are sample datasets, so sampling error will be present in estimates from either source. Despite this fact, one almost never sees precision levels reported for census Long Form estimates. Analysts generally report census Long Form estimates as single numbers. The Census Bureau does make the precision levels available to users, but most data users choose not to work with them. Not incorporating the uncertainty levels into analyses simpliï¬es analyses, some of which are already fairly complicated. However, in practical applica- tion, this also has the effect that many users of the analyses do not understand the nature of these data. A common misconception of many consumers and users of these data is that they are cen- sus data and therefore are actually based on a 100 percent sample of the population (like the decennial census Short Form data). Because the ACS sample sizes are smaller than those of the Long Form, the sampling errors will be more signiï¬cant for ACS, and the misconception that the estimates are completely precise is Using ACS Data 49 Confidence Interval Confidence Level 80 percent 90 percent 95 percent 99 percent Low X â 1.28 * Y X â 1.65 * Y X â 1.96 * Y X â 2.58 * Y High X + 1.28 * Y X + 1.65 * Y X + 1.96 * Y X + 2.58 * Y Table 4.10. Confidence intervals for a large sample parameter with a mean value X and a standard error Y.
more likely to lead to erroneous conclusions. For this reason, the Census Bureau is making a con- certed effort to stress that ACS estimates are just that, statistical estimates, and not counts. The Census Bureau calculates the standard errors for all estimates reported in ACS data prod- ucts using procedures that account for the sample design and estimation methods. These procedures are described in the Census Bureauâs Accuracy of the Data reports, which are updated annually (available at www.census.gov/acs/www/UseData/Accuracy/Accuracy1.htm). All ACS estimates are reported with margins of error or conï¬dence intervals corresponding to the 90 percent conï¬dence level. Using the reported estimates and upper and lower bounds, data users are able to incorporate ACSâs sampling error into their analyses and data presentations. Example Calculations for Incorporating Sampling Error into ACS Analyses To help ana- lysts use and interpret the margin of error provided with the ACS estimates, the Census Bureau provides formulas and some example calculations to guide data users in the Accuracy of the Data reports. There are four example calculations from this source presented and annotated below. 1. Calculation of the standard error of an ACS estimate, 2. Calculation of the standard error of the sum (or difference) of ACS estimates, 3. Calculation of the standard error of the ratio of two ACS estimates, and 4. Calculation of the standard error of the proportion of an ACS total estimate in an ACS subto- tal estimate. Although these examples are for a generic analysis for a wider audience, the same procedures will be used by transportation planners in their most common analyses, as is demonstrated by the case study sections that follow in this guidebook. Example Calculation 1 Determine the standard error of a reported ACS estimate. Problem The ACS estimates the number of males in the United States that have never married to be 33,290,195. The reported lower bound of the estimate is 33,166,192, and the reported upper bound is 33,414,198. What is the standard error of the estimate of the number of males who have never married? Relevant Equations. Standard error = 90 percent conï¬dence margin of error/1.65 Margin of error = max(upper bound â estimate, estimate â lower bound) Note: Many, but not all, ACS intervals are symmetrical around the reported estimate, so choosing the maximum interval is the conservative approach to establishing the margin of error. Calculations Margin of error = max(33,414,198 â 33,290,195), (33,290,195 â 33,166,192))= 124,003 Standard error = 124,003/1.65 = 75,153 Discussion The standard error calculation, in and of itself, may not be particularly edifying, but it is a ï¬rst step that allows users to perform other calculations, like those shown below. Also, by knowing the standard error, analysts can establish upper and lower bound estimates for other conï¬dence levels. For instance, the 95 percent margin of error is 1.96 * 75,153 = 147,300. Example Calculation 2 Determine the Standard Error of a Sum of Reported ACS Estimates. Problem As noted in the previous example calculation, the number of males that have never been married is estimated to be 33,290,195, with upper and lower bounds of 33,414,198 and 33,166,192. The estimate of the number of females that have never married is 29,204,857 with a 50 A Guidebook for Using American Community Survey Data for Transportation Planning
reported lower bound of 29,090,048, and a reported upper bound of 29,319,666. What is the estimated number of all people who have never married? Relevant Equations. Standard error (SE) of a sum Notes: The Census Bureau states that this method will underestimate the standard error if the items in a sum are highly positively correlated, and will overestimate the standard error if the items in the sum are highly negatively correlated. This equation also is valid for the standard error of the difference of ACS reported estimates: SE(XË â YË ) = SE(XË + YË ). Calculations The point estimate of the number of people who have never married is 33,290,195 + 29,204,857 = 62,495,052. From the previous example, the standard error of the estimates for males is 75,153. Applica- tion of the same equation for females yields a standard error of 69,581. Therefore, the standard error of the sum is Once the standard error of the sum has been calculated, analysts can calculate and report asso- ciated conï¬dence intervals. The 90 percent conï¬dence interval for the total number of people who have never married (based on equation in the ï¬rst example) is (62,495,052-1.65(102,418)) to (62,495,052+1.65(102,418)), or 62,326,062 to 62,664,042 people. Discussion The summation of estimates propagates the sampling error inherent in the indi- vidual addend estimates, so the importance of evaluating and reporting the uncertainty in esti- mates derived in this manner is increased. Many census data users, including transportation planners, will frequently need to combine individual census estimates in this way to address their speciï¬c analysis needs. The detailed delineations in several of the transportation-related ACS tabulations will frequently require ana- lysts to sum individual estimates. For instance, ACS tabulations of commuting time of day break the day into very detailed day parts. To analyze longer periods, such as peak periods as opposed to peak hours, analysts will need to sum the time period components. Example Calculation 3 Determine the standard error of a ratio of reported ACS estimates. Problem Suppose the statistic of interest is the ratio of the number of women who have never married to the number of men who have never married. What is the ratio and the standard error of the ratio of females who have never married to males who have never married? Relevant Equations. Standard error of a ratio Note: This approximation is valid for ratios of two estimates where the numerator is not a subset of the denominator. SE X Y Y SE X X Y SE Y Ë Ë Ë [ ( Ë )] Ë Ë [ ( Ë)] â ââ â â â = + 1 2 2 2 2 SE( , , ) ( , ) ( , ) ,62 495 052 75 153 69 581 102 4182 2= + = SE X Y SE X SE Y( Ë Ë) [ ( Ë )] [ ( Ë)]+ = +2 2 Using ACS Data 51
Calculations The equation inputs are calculated as shown above. The ratio of the two estimates is (29,204,857/33,290,195) = 87.73 percent, and the upper and lower bounds for the 90 percent conï¬dence level are 87.73% Â± 1.65*0.29% = 87.25% â 88.21% Discussion This example demonstrates a technique for evaluating how the sampling errors affect the calculation of ratios between two parallel estimates. A transportation-based example of this type of comparison would be if an analyst wanted to make a statement such as, âthere are X times more two-vehicle households than zero-vehicle households in geographic area Y.â These comparisons are not usually that useful for single-variable tables, but are very common and useful when analyzing cross-tabulations, where an analyst might want to say something like, âworkers in zero-vehicle households are X times more likely to commute by transit than work- ers in two-vehicle households.â The more common comparison between a subtotal estimate and its corresponding total estimate (e.g., âX percent of the households have zero vehiclesâ) is covered in the next example calculation. Example Calculation 4 Determine the standard error of a percentage. Problem: Now, suppose the statistic of interest is the percentage of females who have never married in relation to the total number of people who have never been married. What is the per- centage of people who have never married that are women, and what is the standard error of the percentage? Relevant Equations. Standard error of a proportion: Note: This approximation is valid for proportions of two estimates where the numerator (X) is a subset of the denominator (Y). Calculations The point estimate for the proportion of the total that are female is (29,204,857/62,495,052)â100% = 46.73%. From the previous calculations, we know the standard error of the number of females who have never married is 69,581. The standard error for all people who have never married is 102,418. The standard error of the proportion is The proportion is 46.73 percent, and the upper and lower bounds for the 90 percent conï¬- dence level are as follows: 46.73% Â± 1.65 * 0.08% = 46.60% â 46.86% Discussion Determining the percentage that an ACS estimate makes up of an ACS estimated total will be a very common procedure for transportation planners and other census data users. For example, to calculate mode shares for different commuting modes, analysts will apply this procedure. SE 29 204 857 62 495 052 1 62 495 052 69 , , , , , , ( âââ ââ â = , ) ( , , ) ( , , ) ( , )581 29 204 857 62 495 052 102 4182 2 2 â 2 0 08= . percent. SE p Y SE X X Y SE Y( Ë) Ë ( ( Ë )) Ë Ë ( ( Ë))= â 1 2 2 2 2 SE 29 204 857 33 290 195 1 33 290 195 69 , , , , , , ( âââ ââ â = , ) ( , , ) ( , , ) ( , )581 29 204 857 33 290 195 75 5132 2 2 2+ = 0 29. percent. 52 A Guidebook for Using American Community Survey Data for Transportation Planning
4.4.3 Confidence Intervals As discussed earlier, the effect of the smaller ACS sample size, compared to the Long Form sam- ple size, is to increase the sampling error and consequently to increase the standard errors of the estimates. The fact that the ACS estimates are less reliable than the corresponding Long Form esti- mates is well recognized by the Census Bureau, and has led to the release of 90 percent conï¬dence intervals along with the ACS estimates. Previously, conï¬dence intervals were not released with Long Form estimates. Instead, the dataset documentation included a description of the method- ology and tables of parameters that users could employ in calculating these intervals. Data users should learn how to use and interpret these conï¬dence intervals. The various case studies presented in this guidebook illustrate how the conï¬dence intervals affect the conclusions drawn from the analysis. For example, by examining the standard errors (computed from the estimate and the conï¬dence interval) of estimates for two time periods or two populations, one can determine whether there is any real (or statistically signiï¬cant) change in the value of the corresponding characteristic or whether the change is attributed to random error. 4.5 Comparison of ACS Estimates to Census It is likely that many, or most, new ACS data users will begin their analyses of ACS by com- paring ACS estimates to Census 2000 Long Form estimates. There are many methodological differences between the census Long Form and ACS, including differences in â¢ Sample sizes, â¢ Data collection procedures, â¢ Field staff training and capabilities, â¢ Wording of some questions, â¢ Reference periods, â¢ Editing procedures, â¢ Weighting, and â¢ Rounding. Therefore, it would not be that surprising to see differences in the estimates that are not actual differences that can be supported by conventional wisdom or other data sources. Certainly, some of these differences are due to actual improvements in methods, as discussed in Section 4.1. In addition, over time, differences between ACS estimates and Census 2000 estimates will become historical footnotes, as only ACS will be carried into the future. Nevertheless, these facts may not help an analyst very much as she or he tries to understand how important population variables are actually changing within a region. When unexpected differences between ACS and previous Census 2000 estimates are found, analysts may beneï¬t by going through the following checklist: â¢ Examine the ACS margins of error and standard errors, as discussed in the previous section. An ACS point estimate may look odd compared to the Census 2000 estimate, but the differences may be statistically indistinguishable due to the limited sample sizes and the variability in the data. â¢ Remember the Census 2000 Long Form data represent sample data as well. The Census 2000 documentation provides the ability to estimate standard errors for those estimates. â¢ Investigate ACS and Census 2000 data quality measures, such as item imputation rates, related to the speciï¬c questionable conï¬icting results. Imputation rate tables are available with the other base tables on the American FactFinder website. â¢ Compare the Census 2000 and ACS questionnaires for the item(s) in question, and judge whether differences in the surveys would naturally lead to differences in the estimates. The ACS residency deï¬nition and reference period deï¬nition are likely to be the cause of many measured differences between the datasets. Using ACS Data 53
â¢ Determine whether the curious ï¬nding is consistent with what other comparisons between the datasets have noted. Many comparison studies are discussed below. â¢ Decide whether applying benchmarking analysis could address the identiï¬ed issues. A tech- nique for performing this analysis is summarized below. â¢ To the extent possible, identify and utilize validation datasets and administrative records to determine whether the new ACS estimates are reasonable. â¢ Develop caveat language for reports and presentations to explain the dataset differences and the effects of these differences on your analyses. The following subsections provide guidance on implementing these strategies. We begin by exploring the research that has been conducted on differences between Census 2000 and ACS to help analysts better understand where structural differences between the datasets can be expected. 4.5.1 Census Bureau Comparison Reports Four of the reports in the Census Bureau ACS evaluation series compare the results of the C2SS and the decennial census for â¢ General demographic and housing characteristics (Report 4); â¢ Economic characteristics (Report 5); â¢ Social characteristics (Report 9); and â¢ Physical and ï¬nancial housing characteristics (Report 10).30 Each of the reports concludes that at the national level, the C2SS estimates were similar to those produced from the Census 2000 sample. In addition, the researchers compared county- level estimates for counties corresponding to 18 of the ACS test sites. Few county-level estimate differences were found to be substantive. Even when differences were deemed to be statistically signiï¬cant (which was common due to the large sample sizes), the report authors note, âdata users would in general come to similar conclusions, implement similar programs, and allocate funds in a similar way regardless of which dataset they used.â Where differences were found, the researchers considered potential methodological reasons, and recommended actions for the ACS design. Among the reasons identiï¬ed for differences were the following: â¢ Sample coverage differences between the C2SS and decennial census Long Form; â¢ Differences in the reference periods (Census 2000 focused on a single point of time in April 2000; C2SS referred to âlast weekâ and covered all of year 2000); â¢ Questionnaire presentation differences, including question wording and response categories; â¢ Different proxy rules and survey follow-up mechanisms; â¢ Different weighting and estimation procedures; â¢ Better internal checks and veriï¬cation procedures in C2SS than in Census 2000; and â¢ Interviewers who were more experienced and better trained for C2SS than the enumerators for Census 2000. A ï¬fth report in the Census series, Report 8: Comparison of the American Community Survey Three-Year Averages and the Census Sample for a Sample of Counties and Tracts compares esti- mates from the census 2000 Long Form to the same ACS estimates(1999-2001) at the county and census tract level for the 36 ACS test sites.31 54 A Guidebook for Using American Community Survey Data for Transportation Planning 30 See www.census.gov/acs/www/AdvMeth/Reports.htm. 31 U.S. Census Bureau, Meeting 21st Century Demographic Data Needs â Implementing the American Community Survey: Report 8: Comparison of the American Community Survey Three-Year Averages and the Census Sample for a Sample of Counties and Tracts (June 2004).
For this analysis, the Census Bureau selected a manageable number of variables for analysis. Four types of estimates were evaluated. â¢ Demographic estimates included â Age, â Race, â Gender, â Hispanic origin, â Relationship, â Tenure, and â Housing occupancy. â¢ Social estimates included â School enrollment, â Educational attainment, â Marital status, â Disability status, â Grandparents as caregivers, â Veteran status, â Nativity and place of birth, â Region of birth of foreign born, â Language spoken at home, and â Ancestry. â¢ Economic estimates included â Employment status, â Commuting to work, â Occupation, â Industry, â Class of worker, â Income, and â Poverty status. â¢ Housing estimates included â Units in structure, â Year structure built, â Rooms, â Year householder moved into unit, â Vehicles available, â House heating fuel, â Occupants per room, â Value, â Mortgage status and selected monthly owner costs, â Selected monthly owner costs as a percentage of household income, â Gross rent, and â Gross rent as a percentage of household income. For the county-level comparison, the majority of ACS estimates were in agreement with the Census 2000 estimates. Some of the statistically significant county-level differences were found to be small enough that they would not impact the use of the data. In addition, many of the differences could be attributed to differences in the questionnaires, procedures or both. Unfortunately, because of the small sample size in the ACS, meaningful comparisons at the cen- sus tract level were difï¬cult to perform. Although the general patterns for the tract data tended to mirror the county patterns, the high levels of variance for the tracts tended to reduce the number of detectable differences. Using ACS Data 55
The report uses the Z-score to determine whether differences are due to the sampling vari- ability or are probably due to issues other than sampling variability. The report ï¬nds that most of the variables show small differences between the ACS and Cen- sus 2000. At the county level, a large number of counties showed statistically signiï¬cant differ- ences in disability status, Hispanic origin, and employment status. Some other authors32-33 (Stern, 2003, Salvo et al., 2004) have noted that the Census 2000 disability rates may have been inï¬ated, partly because of misinterpretation of the Census 2000 survey question. Stern notes that âdifferences in disability are traced to computer interviewing in the ACS (a clear improvement over the Census 2000 and ACS mail questionnaire). Differences in race responses are partly traced to the use of permanent ï¬eld staff where the response âsome other raceâ is not a response category in most other surveys and a much smaller number of these responses are observed in ACS than in Census 2000.â34 Differences also were seen in labor force participation, mean travel time (Census 2000 estimates are consistently higher), vehicles available in households (Census 2000 estimates were signiï¬cantly higher in six counties for households with no vehicles, and ACS estimates were signiï¬cantly higher in ï¬ve counties for households with three or more vehicles), and means of transportation to work. The carpool to work category (in mode to work) recorded the highest difference with ACS num- bers consistently lower in 9 of 36 counties. Tables 4.11 through 4.14 summarize the county-level differences reported for the sample census variables. Note, of the 36 counties analyzed, small is deï¬ned as fewer than 4 counties with signiï¬cant differences; moderate is deï¬ned as between 4 and 8 counties; large is deï¬ned as 9 or more counties. 56 A Guidebook for Using American Community Survey Data for Transportation Planning 32 S.M. Stern, (2003). Counting People with Disabilities: How Survey Methodology Inï¬uences Estimates in Census 2000 and the Census 2000 Supplementary Survey. Report submitted to the U.S. Census Bureau. Washington, D.C. 33 Joseph Salvo, Peter Lobo, and Timothy Calabrese. Small Area Data Quality: A Comparison of Estimates 2000 Census and the 1999-2001 ACS, Bronx, New York Test Site, 2004. 34 U.S. Census Bureau, Meeting 21st Century Demographic Data NeedsâImplementing the American Community Survey: Report 8: Comparison of the American Community Survey Three-Year Averages and the Census Sample for a Sample of Counties and Tracts (June 2004), p. xvii. 35 Of the 36 counties analyzed, small is deï¬ned as fewer than four counties with signiï¬cant differences; moderate is deï¬ned as between four and eight counties; large is deï¬ned as nine or more counties. Estimate Category ACS â Census 2000 Difference35 Sex Small Age Moderate Race Large Hispanic Origin Large Relationship Large Tenure Moderate Household by Type Large Housing Occupancy Large Source: United States Census Bureau, 2004. Table 4.11. Number of counties with statistically significant differences between ACS and Census 2000 demographic estimates.
Estimate Category ACS â Census 2000 Difference School Enrollment Moderate Educational Attainment Moderate Marital Status Moderate Grandparents as Caregivers and Veteran Status Small Disability Large Nativity and Place of Birth Moderate Region of Birth of Foreign Born Small Language Spoken at Home Large Ancestry Large Source: United States Census Bureau, 2004. Estimate Category ACS â Census 2000 Difference Employment Status Large Commuting to Work Moderate Occupation Small Industry Small Class of Worker Moderate Household Income Moderate Income by Type Large Family Income Small Poverty Status Small Source: United States Census Bureau, 2004. Estimate Category ACS â Census 2000 Difference Units in Structure Large Year Structure Built Large Number of Rooms Large Year Householder Moved into Unit Small Number of Vehicles Moderate House Heating fuel Moderate Selected Housing Characteristics Large Occupants per Room Large Housing Value Moderate Mortgage Status and Selected Owner Costs Small Selected Monthly Costs as a Percentage of Household Income Moderate Gross Rent Moderate Gross Rent as a Percentage of Household Income Large Source: United States Census Bureau, 2004. Table 4.12. Number of counties with statistically significant differences between ACS and Census 2000 social estimates. Table 4.13. Number of counties with statistically significant differences between ACS and Census 2000 economic estimates. Table 4.14 Number of counties with statistically significant differences between ACS and Census 2000 housing estimates.
4.5.2 Local Area Experts Comparison Reports In addition to the reports prepared by staff, the Census Bureau contracted with four local experts to provide site-speciï¬c analysis of these data. With their local knowledge of the counties, they provided a comprehensive interpretation of the data from a user perspective. Bronx County, New York Bronx County data were assessed in the report summarized at www.census.gov/acs/www/AdvMeth/acs_census/lreports/SalvoLoboCalabrese.pdf and written by Joseph Salvo, Peter Lobo, and Timothy Calabrese in March 2004. Because the three-year aggregate ACS sample size for the Bronx (10.2 percent of total housing units) was very small at the census tract level, this report examined data at a neighborhood level.36 The 355 tracts in the Bronx were aggregated to 88 neighborhoods. The report ï¬nds that the mail return rates between the census and ACS are only modestly correlated (0.42). The average Cen- sus 2000 return rate was 53 percent. During the period 1999-2001, the ACS had an average return rate of 36 percent, decreasing from 38 percent in 1999 to 34 percent in 2001. The ACS also has a response rate that varies by geographic area. However, the allocation levels were lower in the ACS than in Census 2000, both for housing and population items. The ACS produced higher percentages for people in the labor force than Census 2000. Car- pool rates in ACS were about 2 percent smaller than Census 2000. Table 4.15 shows some of the variables for which statistically signiï¬cant and meaningful37 differences were found between the ACS and Census 2000. The authors expressed concern regarding the adequacy of ï¬ve-year accumulated data at the census tract level, as follows: Another concern, again related to the heavy dependence in the ACS on non-response follow- up, is that ï¬ve years of data may not be enough to generate reliable estimates at the census tract level if mail return rates do not improve. This study provides a good illustration of what limits a 9 versus 15 percent sample placed on our ability to derive reliable estimates, namely the use of 88 neighborhood tract aggregates in lieu of estimates for the actual 355 census tracts. 58 A Guidebook for Using American Community Survey Data for Transportation Planning 36 ACS sample rates were 15 percent for Puma, Hampden, Douglas, and Multnomah Counties. For Broward, Bronx, San Francisco, Lake, and Franklin Counties, the ACS three-year aggregate sampling rate was closer to 10 percent. 37 âMeaningfulâ differences are deï¬ned by the authors as statistically signiï¬cant differences of 2 percent or more between ACS results and Census 2000 results. Variable Population Aged 21-64 with Disability Commute Via Carpool Commute via Public Transportation Mean Travel Time to Work Civilian Employment Median Household Income Mean Earnings Poverty Status of Individuals Vehicles Available in Household = 1 ACS 19.0% 7.0% 57.0% 40.4 minutes 50.3% $26,185 $41,552 56.8% 30.1% Census 2000 31.8% 9.3% 53.9% 43.1 minutes 45.7% $27,611 $44,116 58.8% 28.8% Source: Salvo, Lobo, and Calabrese, 2004. Table 4.15. Variables with statistically significant differences in Bronx County: ACS versus Census 2000.
Multnomah County, Oregon Multnomah County ACS test site data were assessed in the report found at www.census.gov/acs/www/AdvMeth/acs_census/lreports/hough_swanson.pdf and written by George Hough and David Swanson in March 2004. Examining the self-response rates for Multnomah County, the authors state that if the only data for the survey were to come from self-response, ACS would have signiï¬cant problems in areas where there is a concentration of minority populations. The most important issue under- lying all of their concerns is funding the ACS effort continuously. âSufï¬cient funding for imple- menting the 2010 ACS plan must be ensured for a longer time horizon than the annual federal budget process now allocates.â However, the ACS allocation rates were lower than those of Census 2000 for population and housing items. ACS provided better data than Census 2000 for sample unit non-response rates, occupied sample unit non-response rates, and housing unit sample completeness ratios, with no signiï¬cant difference observed for the household population sample completeness ratios. Census 2000 results were better than ACS when examining vacant housing unit non-response rates. The Census 2000 sample uses population, housing unit, and household controls, while the ACS weights housing units and population solely. Using the speciï¬c housing unit and popula- tion weights to estimate households results in a difference between the number of householders and corresponding households of about 5,000. Another contribution of this report is an alternate analysis of differences by using a method called âLoss Function.â The Loss Function summarizes the information in the absolute numeric and absolute percent differences by combining them in a weighted fashion. Using the Loss Func- tion, the authors identiï¬ed some concerns with the measurement of race variables in ACS. The authors suggest that the Census Bureau release estimates for aggregated racial groupings as opposed to the detailed race groups that currently are provided. Signiï¬cant differences also were observed for Hispanic population. San Francisco and Tulare Counties, California Data from San Francisco and Tulare Coun- ties were assessed in the report provided at www.census.gov/acs/www/AdvMeth/acs_census/ lreports/gage.pdf and written by Linda Gage. This report compared ACS and Census 2000 data for San Francisco and Tulare Counties. The report notes striking differences in data collection on race, disability status, vacancy status, number of rooms in structure, and grandparents as caregivers. However, 80 percent of the total variables were comparable. There were signiï¬cant differences in the percentage of foreign-born, educational attainment, and language spoken at homeâthe author states that the rates of allocation in Census 2000 are the reason for the differences. Response rates were signiï¬cantly improved under the ACS for most difï¬cult items such as income. The reportâs ï¬ndings on non-response are consistent with the Census Bureauâs quality measures report. Census 2000 data shows higher percentages of work- ers commuting by carpool, longer commute times, and higher percent of households without vehi- cles than ACS. ACS shows higher percent of households with one vehicle. The author provides strategies for analyzing and using census data. ACS prospects and predicaments also are delineated in the report as follows: â¢ The amount of data available to make your own assessment of the comparability, quality, use- fulness, and potential beneï¬ts of ACS is initially overwhelming. The data, quality measures, and geography make analysis a challenge. Statistical measures like the differences, standard errors, Z-scores and P-values can help quickly identify signiï¬cant differences but some statis- tically signiï¬cant differences may not be meaningful differences in the world of the data user. In general, the ACS appears to be measuring the same things in much the same ways as the census and getting similar results. There is still much to learn about data comparability, reasons for differences and whether âdifferentâ is better, worse or just different. There are dif- Using ACS Data 59
ferences between the census and ACS, some statistically signiï¬cant differences. These may ultimately be welcome differences if ACS data are consistent, more current, and of higher quality than data from the Census 2000 Long Form sample. A few suggestions as you proceed to use the ACS data: â Do not try to analyze all the data all at once even if you use all the items or must supply them to others. â Concentrate on the data items that you already use in your work frequently. Compare those items with the census data. â Do not assume the census picture is more accurate. Check the quality measures. â Compare ACS and census data to administrative records that you may have available. â Consider whether the data make sense. â Learn to use and provide standard errors supplied with ACS data. â Communicate your ï¬ndings with the Census Bureau and others evaluating the ACS data. This will improve the survey as it matures. â¢ ACS has been designed to collect and provide more complete and current demographic, social, economic, and housing information between censuses and to replace the Census 2010 Long Form. The success of this endeavor depends upon continuous and adequate funding, sufï¬cient sample sizes, and a current and accurate MAF. Shortfalls in any of these areas could reduce data quality. The decennial census is subject to the same perils. â¢ As the ACS continues to evolve and improve, a few of the identiï¬ed challenges include: â Resident populations in facilities such as prisons and dormitories (group quarters), â Improving the Census Bureauâs population estimates used as the population controls for the ACS, and â Assisting data users to use a series of averaged data and data for small jurisdictions and seasonal areas. Vilas and Oneida Counties, Wisconsin and Flathead and Lake Counties, Montana Data from these sample Wisconsin and Montana counties were assessed in the report provided at www.census.gov/acs/www/AdvMeth/acs_census/lreports/vossetal.pdf and written by Paul Van Auken, Roger Hammer, Paul Voss, and Daniel Veroff in March 2004. This report assesses ACS attributes and quality measures at county and tract levels for coun- ties with seasonal population. Based on seasonality in these counties, the authors anticipate ACS values to be higher for older population, median age, occupied housing units, median income, and housing values, and lower for unemployment and average household size. Because rural census tracts are so large in geographic extent and encompass governmental units, the authors would like to have data at the minor civil division level, in addition to census tracts. Because the Census Bureau expects ACS to achieve a (ï¬ve-year) sample that is 75 percent of the census Long Form, and because the housing unit response is roughly around 75 percent of those originally in the sample, the ACS âinterviewedâ sample size would be 56 percent (0.75 Ã 0.75 =.56) of the â100 percent responseâ census Long Form. The authors expect this to be exac- erbated in rural areas. All four counties studied exhibited a sizeable difference in economic and housing attributes for over 20 percent of items. ACS was successful in capturing some of the sea- sonal variations. Plotting the annual estimates of ACS at the county level, the authors found that ACS would be unable to provide reliable annual estimates for smaller areas like Vilas and Oneida Counties, particularly if they are not over sampled. The authors also plotted the ratio of ACS and Census 2000 standard errors at the geography of census tract to ï¬nd substantial cases where the ratio is more than 1.3, the level predicted by the Census Bureau. 60 A Guidebook for Using American Community Survey Data for Transportation Planning
The authors did not make any conclusions on the comparison of the ACS and Census 2000 data, citing the following four reasons: 1. Lack of data at the minor civil division level for comparison: The authors believe that data at the minor civil division level is critical to providing meaningful data for governmental units in rural areas. 2. Access to uncontrolled estimates from ACS: The authors want to review the ACS numbers, properly weighted, but without the ï¬nal control to the population and housing estimates to examine what the ACS implies in terms of numbers of people/housing units in addition to their characteristics. 3. Because of a sampling error, ACS samples for some of the counties are substantially smaller than Census 2000 samples, thus yielding estimates with higher standard errors and more uncertainty. 4. One of the goals of the ACS is for standard errors in ACS not to exceed Census 2000 standard errors by more than 33 percent at all levels of census geography. At the tract level, attribute standard errors for the ACS appear to exceed those obtained in the Long Form by more than 33 percent. 4.5.3 ACS Transportation-Related Research One objective was to compare the ACS data to the decennial census data so as to be able, to the extent possible, to make conclusions about the differences between the data sources, the rel- ative accuracy of the data sources, and the adequacy of the ACS data. The differences between the ACS and CTPP estimates can be attributed to several factors, including differences in sam- pling rates, survey methodology, wording of the questions, timeframe of data collection, con- trol totals, and rounding. It is important to understand both the magnitude and the signifi- cance of these differences, and how they would impact transportation planning applications. We evaluated the general quality and validity of three-year accumulations (1999-2001) of ACS transportation-related data based on residence, workplace, and ï¬ow for nine test counties by comparing them to Census 2000 data that corresponds to CTPP Part 1, Part 2, and Part 3 data. The ACS and census data tables were provided to the project team by FHWA, which had received them for evaluation from the Census Bureau. Appendix I summarizes the analyses that were conducted. Some conclusions of these analyses were as follows: â¢ In general, the CTPP and ACS datasets appear to show the same patterns for the transporta- tion-related tables. Only a small number of tracts and TAZs in the test counties for which data were available had signiï¬cant variances between the two datasets. â¢ When we correlated the differences that were found with other tract and TAZ variables, we detected some systematic biases in the residence-based estimates, most notably for the fol- lowing variables: â Disability status; â Disability status by mode to work; â Tenure (speciï¬cally, the owned-with-mortgage category); â Number of workers in the household by vehicles available by household income; â Poverty status (speciï¬cally the category for incomes between 100 percent and less than 150 percent of poverty); and â Telephone availability. â¢ Although the analyses of workplace-based estimates were more limited by the available com- parison data, we did not identify any systematic biases. â¢ Effective comparisons of worker ï¬ow data were not possible. Using ACS Data 61
4.5.4 Questionnaire Considerations There are many questionnaire and data collection differences between ACS and the census Long Form data that affect the comparability of individual estimates (see Section 2 and Appen- dix A), but two differences are likely to affect many of the estimates, including transportation- related characteristics. The ACS residency deï¬nition and reference period deï¬nition will have an important effect on many comparisons of ACS and Census 2000. Residency Definition The ACS uses different residence rules than have been employed in past decennial censuses. Although the decennial census uses the usual residence concept, the ACS uses the current residence concept along with the Two-Month Rule. The current residence concept suits the ACS, because the ACS continuously collects infor- mation from monthly samples throughout the year. The current residence concept recognizes that people can live more than one place over the course of a year, and that population estimates for some areas may be noticeably affected by these people. Seasonal areas can experience impor- tant increases in their population over the year, increases that are not measured when only usual residents are recognized. While the use of the current residence concept gives a more accurate picture of an areaâs pop- ulation, it does present some challenges (for example, in integrating ACS data with intercensal population estimates, which employ the decennial census usual residence deï¬nition).38 Reference Period Since ACS data are collected continuously, the annual ACS estimates rep- resent cumulative data over the 12-month interview cycle, and thus average annual conditions. In contrast, decennial census data represent point-in-time conditions. The implications of the different reference dates are that ACS data will more accurately capture average conditions in seasonal areas. Decennial census data will only reveal characteristics of those areas on a single day, which might be quite different from conditions at other times of the year. Using average annual data in models or analyses that are developed based on point-in-time data might be inconsistent, and this presents challenges to the analyst. For example, using aver- age annual data or multiyear moving average data to calibrate/validate a travel demand model (e.g., trip distribution models, mode choice models) that predicts at a single point in time is the- oretically inconsistent. However, this might not be a major issue if changes in household char- acteristics or mode choices are not signiï¬cant over the period when the data are collected. It is important to note that even though the rolling reference period procedures are used in ACS, the ultimate population control for any given year is the July 1 estimate. The implications of this control for seasonal analysis are discussed in Section 4.6. 4.5.5 Bridging between Year 2000 Census Data and ACS Much of the discussion in previous sections has focused on identifying why seemingly sur- prising differences might occur between Census 2000 Long Form estimates and ACS estimates from roughly the same time. This section describes how the analyst might apply corrective fac- tors to allow for better comparisons. Suppose the following data shown in Table 4.16 on the mean travel time in a given area are available. The analyst wants to determine the change in the mean travel time from 2000 to 2005. Decennial census data are available in year 2000 but not in any of the following years; ACS data are not available for this area in year 2000 but are available afterwards. 62 A Guidebook for Using American Community Survey Data for Transportation Planning 38 Amy Symens Smith, âThe American Community Survey and Intercensal Population Estimates: Where Are the Crossroads?â 1998. See www.census.gov/population/www/documentation/twps0031/twps0031.html.
The 2001 to 2005 ACS travel time series shows an increasing trend in travel time over the years. However, the Census 2000 mean travel time estimate is larger than the ACS estimates in each of the years from 2001 to 2005. If analysts compare the raw Census 2000 estimate to the 2005 ACS estimate, they might erroneously conclude that congestion has decreased, and low- ered journey-to-work travel times by 0.4 minutes. However, this conclusion is probably inaccurate because the two estimates are drawn from two different surveys. When one accounts for the inherent differences between the surveys cor- rected through an analytical comparison of Census 2000 and C2SS data, a more reasonable con- clusion can be drawn. As discussed below, the Census 2000 estimate can be converted to a 2000 ACS-like estimate by multiplying it by a factor of 0.9552, resulting in a 2000 estimate of 28.7 minutes. Given this estimate, one could conclude that the travel time increased between years 2000 and 2005 by 0.9 minutes. The process of reconciling the estimates from one survey to the estimates from another sur- vey is called benchmarking. It is typically done when two surveys with different precision levels and collection frequencies are available for providing estimates of a given populationâs charac- teristics. The survey that is normally used as the benchmark is the one whose estimates are more reliable. Different methods exist for benchmarking such as constrained estimation,39 prediction models,40 and imputation of adjusted responses.41 Since future data releases will only be from the ACS, Census 2000 data can be reconciled to produce year 2000 ACS-like data that could then be more consistently compared to future releases of ACS data. One method that has been used to bridge this gap is regression analysis. Year 2000 ACS data are available from C2SS for 216 counties with population above 250,000. The C2SS data for these counties can be used together with decennial census data for the same counties to analyze the differences between the two data sources. For this guidebook, the following variables were analyzed: â¢ Mode to work, â¢ Travel time to work, Using ACS Data 63 39 This method can consist, for example, of adjusting the weights used to obtain the ACS estimates so that the ACS weighted annual average for selected characteristics would be equal to that of the census. 40 In this method, a model (e.g., a regression) is developed using census estimates as the dependent variables and ACS estimates as predictors (or independent variables). The ï¬tted equation can then be used to calibrate the ACS estimates to the census estimates by doing empirical Bayesâ smoothing. This only applies, however, to the ACS variables included in the model. 41 This method consists of estimating âwhat proportion of ACS respondents must have given the wrong answers to produce the observed differences, and then imputing the necessary proportion of different answers to bring agreement [with the census]â. It requires the estimation of a measurement error model based on the differences between ACS and the census. Year ACS Mean Travel Time Census Mean Travel Time 2000 NA 30 2001 28.8 NA 2002 29.0 NA 2003 29.2 NA 2004 29.4 NA 2005 29.6 NA Table 4.16. Example of ACS and census data comparison.
â¢ Vehicle availability, and â¢ Income. For each variable of interest, we regressed the 2000 ACS estimate as the dependent variable against the 2000 decennial census estimate as the independent variable. The slope of this regression (with no intercept) provides a factor that can be interpreted as a factor that could be multiplied by the cen- sus estimate to obtain an ACS-like estimate. We did this analysis using all 216 county observations, as well as separately by metropolitan statistical area/consolidated metropolitan statistical area (MSA/CMSA) size to account for any biases that might be a function of area size. Table 4.17 shows the factors obtained for means of transportation to work. The following cat- egories are used: percent that drove alone, percent that carpooled, percent that used public transportation, and percent that walked. Therefore, based on this regression analysis, we could conclude that the Census 2000 drove- alone mode share would be more consistent with ACS estimates if the census estimate were mul- tiplied by 1.0099. Note that some of the factors for the walk and transit modes are fairly large, indicating that there were signiï¬cant differences between the raw Census 2000 and ACS-like C2SS estimates. Table 4.18 shows the factors obtained for vehicle availability, and used the categories of per- cent of zero-vehicle households and average auto ownership (average vehicles per household). Tables 4.19 and 4.20 show the factors obtained for travel time to work. Categories used are mean travel time (minutes), percent with short commutes (less than 20 minutes), and percent with long commutes (greater than 20 minutes). Table 4.21 shows the factors obtained for median household income. Of course, these benchmarking factors are crude measures of the differences between Census 2000 and ACS, but analyses like these could help analysts understand and report trend data that rely on the different datasets. 64 A Guidebook for Using American Community Survey Data for Transportation Planning Mode Drove-Alone Carpool Pooled Sample 1.0099 0.9249 Public Transportation Walked MSA/CMSA < 1 Million 0.9701 0.7579 MSA/CMSA: 1-5 Million 1.0043 0.8496 MSA/CMSA > 5 Million 1.0861 0.9265 Vehicles Available Pooled Sample MSA/CMSA < 1 Million MSA/CMSA: 1-5 Million MSA/CMSA > 5 Million Zero 0.9164 0.8887 0.9617 Average Number 1.0182 Travel Time Pooled Sample Mean travel time 0.9552 Table 4.17 Means of transportation to work. Table 4.18 Vehicle availability. Table 4.19 Mean travel time to work.
4.6 Implications of ACS Data Release Frequency 4.6.1 Frequency of Data Releases Annual estimates will be released for areas with population greater than 65,000, starting in year 2006. Three-year moving average estimates will be released for areas with population greater than 20,000 starting in year 2008. Five-year moving averages will be released for all areas starting in year 2010. Table 4.22 illustrates the data release schedule. The main advantage of ACS in this respect is the timeliness of the data. This is especially impor- tant in mid-decade or during the years prior to the decennial census, where the census data from the previous decennial census would have become relatively outdated. Moreover, the availability of the ACS data on an annual basis, especially for large areas where the estimates are more reli- able, enhances the ability to do trend analysis and use other time series analysis methods. The availability of continuously updated data, however, might create burdens for analysts and data keepers. Transportation analysts should determine the frequency of updating their travel surveys (development and expansion), market analyses (e.g., environmental justice analysis), and travel demand models. This will depend on the particular analysis performed, area size, cost of the update, and utility obtained from updating the analysis. For example, travel demand mod- els might not need to be updated annually; a ï¬ve-year modeling cycle might be sufï¬cient. Moreover, users should determine which type of ACS estimate to use when there is more than one type available for a given area. For example, for areas with population greater than 65,000, annual estimates, three- and ï¬ve-year moving averages are released. For areas with population between 20,000 and 65,000, three- and ï¬ve-year moving averages are released. The type of esti- mate to use will depend on the purpose of the analysis, as follows: â¢ ConsistencyâIf the characteristics of two populations in areas of similar geographic scales (e.g., populations of two counties or two states) are compared, it is important to use the same Using ACS Data 65 Travel Time Pooled Sample Percent of commuters with short commutes (< 20 minutes) 1.0207 Percent of commuters with long commutes (> 20 minutes) 0.986 Table 4.20 Travel time to work. Median Income Decennial 2000 Decennial 1999 MSA/CMSA < 1 Million 0.9396 0.971 MSA/CMSA:1-5 Million 0.955 0.9869 MSA/CMSA > 5 Million 0.9648 0.997 Data for the Previous Year Released Inâ¦ Type of Data Population/Size of Area 2006 2007 2008 2009 2010 2011 2012 1.Annual Estimates 65,000+ 2.Three-Year Averages 20,000+ 3.Five-Year Averages Tract/Block Group Table 4.21. Median income. Table 4.22. ACS Data release schedule.
type of estimate to ensure consistency. For example, if County A has a 65,000+ population and County B has a population less than 65,000, then it is recommended that the multiyear or cumulative average estimate from County A (rather than the single-year estimate, which is available) be used to compare it to the moving average estimate from County B (where annual estimates are unavailable). â¢ Reduction in Lag TimeâIf the timeliness of the data is important for the analysis, and if the single-year estimates are deemed reliable (e.g., with reasonable standard errors and without too many ï¬uctuations), the analyst could use the single-year estimates rather than the mov- ing average estimates to reduce the lag time between the analysis year and data collection year. â¢ Greater ReliabilityâIf the analysis focuses on a certain subpopulation for which three- and ï¬ve-year moving averages are available, and if greater reliability is desired, the ï¬ve-year mov- ing averages would be more stable to use. â¢ Reducing CorrelationsâMoving averages that include overlapping years are correlated (see the discussion below). Therefore, when testing for the signiï¬cance of an annual rate of change, it is recommended that annual estimates be used rather than moving average estimates that include overlapping years. 4.6.2 Measuring ACS Changes Across Years The improved frequency of data allows users to better analyze changes within prescribed geo- graphic areas. A new data product, the multiyear proï¬le, provided by the Census Bureau summa- rizes the year-to-year changes in ACS estimates and identiï¬es statistically signiï¬cant differences. The computational techniques used by the Census Bureau, as well as those that can be used by ACS analysts, for comparing estimates across years are summarized in a Census Bureau data accuracy memorandum entitled 2002 and beyond Change Profile Accuracy.42 This document provides two useful example calculations that are summarized below. These examples show how to â¢ Determine the statistical signiï¬cance of differences in percent distributions; and â¢ Determine the statistical signiï¬cance of other differences. Example Calculation 1 Determine if a year-to-year difference in an ACS percentage is sta- tistically signiï¬cant. Problem The 2001 ACS for Bronx County, New York, estimates the number of women, aged 15 and over, to be 533,280, with lower and upper bounds of 533,062 and 533,498, respectively. The estimated number of these women who have never married is 213,545, with a lower bound of 208,349 and an upper bound of 218,741. In 2002, ACS estimated the number of women age 15 and over to be 538,338, with lower and upper bounds of 537,558 and 539,118, and the number of these women who have never married to be 220,675, with lower and upper bounds of 214,146 and 227,204. Did the percentage of women who have never been married increase signiï¬cantly between the years? Relevant Equations Standard error = 90 percent conï¬dence margin of error/1.65 Margin of error = max(upper bound â estimate, estimate â lower bound) 66 A Guidebook for Using American Community Survey Data for Transportation Planning 42 See www.census.gov/acs/www/Downloads/ACS/accuracy2002change.pdf.
Note: many, but not all, ACS intervals are symmetrical around the reported estimate, so choosing the maximum interval is the conservative approach to establishing the margin of error. Standard error of a proportion: Note: this approximation is valid for proportions of two estimates where the numerator (X) is a subset of the denominator (Y). Difference Standard error of the difference: Margin of error of the difference: ME(DIFF) = 1.65 x SE(DIFF) Calculations Year 2001 SE(XË) = SE(213,545) = (218,741 â 213,545)/1.65 = 3,149 SE(YË ) = SE(533,280) = (533,498 â 533,280)/1.65 = 132 Year 2002 SE(XË) = SE(220,675) = (227,204 â 220,675)/1.65 = 3,957 SE(YË ) = SE(538,338) = (539,118 â 538,338)/1.65 = 473 Comparison DIFF = 100 Ã (0.410 â 0.400) = 1.0 percent ME(DIFF) = 1.65 Ã 0.9 = 1.5 percent Lower bound = 1.0 percent â 1.5 percent = â0.5 percent Upper bound = 1.0 percent + 1.5 percent = 2.5 percent Discussion Since the lower bound and upper bound have different signs, the year-to-year difference is not signiï¬cant at the 90 percent conï¬dence level. Example Calculation 2 Compare differences for other estimates. SE DIFF percent( ) [ . ] [ . ] .= + =0 006 0 007 0 92 2 SE p SE( Ë) ( . ) , ( , ) , = = â0 410 1 538 338 3 957 220 675 5 2 2 38 338 473 0 007 2 2 , ( ) .= SE p SE( Ë) ( . ) , ( , ) , = = â0 400 1 533 280 3 149 213 545 5 2 2 33 280 132 0 006 2 2 , ( ) .= SE DIFF SE P SE PFinal Initial( ) [ ( Ë )] [ ( Ë )]= +2 2 DIFF P PFinal Initial= Ã â( )100 Ë Ë SE p Y SE X X Y SE Y( Ë) Ë ( ( Ë )) Ë Ë ( ( Ë))= â 1 2 2 2 2 Using ACS Data 67
Problem The mean travel time to work for Bronx County in 2001 was 41.0 minutes, with an upper bound of 41.7 minutes. In 2002, the ACS mean travel time to work was 41.8 minutes with an upper bound of 42.8 minutes. Did the mean travel time to work change signiï¬cantly between the years? Relevant Equations Means and other non-percentage ACS estimates are as follows: Difference: Standard error of the difference: Margin of error of the difference: ME(DIFF) = 1.65 Ã SE(DIFF) Calculations Year 2001 SE(XË) = SE(41.0) = (41.7 â 41.0)/1.65 = 0.4 Year 2002 SE(XË) = SE(41.8) = (42.8 â 41.8)/1.65 = 0.6 Comparison DIFF = (41.8 â 41.0) = 0.8 minutes ME(DIFF) = 1.65 Ã 0.7 = 1.2 minutes Lower bound = 0.8 minutes â 1.2 minutes = â0.4 minutes Upper bound = 0.8 minutes + 1.2 minutes = 2.0 minutes Discussion Since the lower bound and upper bound have different signs, the year-to-year difference is not signiï¬cant at the 90 percent conï¬dence level. With the standard errors of 0.4 minutes for 2001 and 0.6 minutes for 2002, the difference in the mean travel time would have to had been more than 1.2 minutes for the difference to be statistically signiï¬cant. Alternatively, if the 2002 standard error were 0.27 minutes, the difference of 0.8 minutes would have been statistically signiï¬cant at the 90 percent conï¬dence level: DIFF = (41.8 â 41.0) = 0.8 minutes ME(DIFF) = 1.65 Ã 0.48 = 0.79 min Lower bound = 0.8 minutes â 0.79 minutes = 0.01 minutes Upper bound = 0.8 minutes + 0.79 minutes = 1.59 minutes. An analyst is not restricted to using the 90 percent conï¬dence level even though the Census Bureau reports the data at this level. If one wanted to compare the mean travel times for the dif- ferent years using a conï¬dence level of 80 percent, the calculations could be accomplished as shown below: SE DIFF( ) [ . ] [ . ] .= + =0 4 0 27 0 482 2 min SE DIFF( ) [ . ] [ . ] .= + =0 4 0 6 0 72 2 minutes SE DIFF SE X SE XFinal Initial( ) [ ( Ë )] [ ( Ë )]= +2 2 DIFF X XFinal Initial= â( )Ë Ë 68 A Guidebook for Using American Community Survey Data for Transportation Planning
Year 2001 SE(XË) = SE(41.0) = (41.7 â 41.0)/1.65 = 0.4 Year 2002 SE(XË) = SE(41.8) = (42.8 â 41.8)/1.65 = 0.6 The calculation Census Bureau upper and lower bounds for values use a 90 percent conï¬dence level. Thus, 1.65 is used as the denominator. DIFF = (41.8 â 41.0) = 0.8 minutes ME(DIFF) = 1.28 Ã 0.7 = 0.9 minutes For the comparison, the critical value associated with the 80 percent conï¬dence level, 1.28, is used to calculate the margin of error of the difference. Table 4.10 showed factors associated with different conï¬dence levels, and a statistics textbook would include others. Lower bound = 0.8 minutes â 0.9 minutes = â0.1 minutes Upper bound = 0.8 minutes + 0.9 minutes = 1.7 minutes. Therefore, even at the 80 percent conï¬dence level, the lower and upper bounds of the differ- ence are opposite signs indicating that the difference is not statistically signiï¬cant. In practice, it is not likely that an analyst would be interested in making comparisons with conï¬dence levels that are lower than the Census Bureauâs 90 percent. It is more likely that if one were using a different conï¬dence level, it would be the 95 percent conï¬dence level (for which one would use a critical value factor of 1.96). 4.6.3 Multiyear Averaging/Analysis of Overlapping Averages The main advantage of moving averages, as compared to annual estimates, is that moving averages smooth the data and are thus more reliable (lower standard errors, less year-to-year variation). Since moving averages smooth out the random fluctuations in the data, they can provide a clearer visual picture of the overall trend in a certain variable of interest. The main disadvantage of moving averages is the lag time associated with them. If conditions are rela- tively stable across the years over which data are averaged, multiyear average estimates will be close to the annual estimates. However, if conditions change dramatically in a given year, the annual estimate reflects the change in a more timely manner than does the multiyear average. There are two issues to consider with regard to the use of ACS multiyear averages. The first issue is related to the comparison of two moving averages that include overlapping years. It is important to note that statistically valid annual estimates of change cannot be computed from the difference of two moving averages if the two moving averages are based on data from over- lapping years, such as from a moving average of years 1996-1998 and a moving average of years 1997-1999. This is because when standard statistical procedures are used to test for sig- nificant differences between estimates over time, it is assumed that the two estimates are drawn from independent samples. This assumption is violated in the case of the overlapping moving averages. One tempting way to look at the comparison of two consecutive overlapping moving averages, say 2003-2005 and 2004-2006, is that it is in essence a comparison of the difference between 2006 SE DIFF( ) [ . ] [ . ] .= + =0 4 0 6 0 72 2 minutes Using ACS Data 69
(which is only in the second multiyear period) and 2003 (which is only in the ï¬rst multiyear period). Unfortunately, the fact that the Census Bureau has released these data as multiyear averages is a recognition that a direct comparison between 2003 by itself and 2006 by itself for this geography is not valid because the individual year sample sizes will not support the comparison. When an ana- lyst uses the multiyear overlapping estimates to make conclusions about single years, he or she is, in effect, cheating by using an artiï¬cially high number of data records that include the overlapping years (2004 and 2005, in this case). The analyst is claiming the reduced sampling error that comes with more data records, but in reality, only a portion (a third, in this case) of each sampleâs records actually contribute to the comparison the analyst is making. This is not to say that one should not get a qualitative idea of the pattern of change from exam- ining these overlapping moving averages, especially as they accumulate over time. Such time series will be very informative to data users as they try to capture what is happening in a region over time. However, there is a need to be cautious about making deï¬nitive conclusions about the differences of the overlapping estimates. As the combination of multi- and single-year averages accumulate for geographic areas of dif- ferent sizes, it is likely that it will be common for transportation planners and other ACS data users to develop factoring methods and iterative proportional ï¬tting methods that combine multiyear average estimates for smaller geographic areas and single-year estimates for corre- sponding larger geographic areas to synthesize single-year estimates for geographic areas that do not support this level of ACS reporting. For more homogenous areas and ACS characteristics, these methods will provide reasonable small area estimates. However, analysts will need to remember that the ACS sample sizes do not really support such analyses, and therefore any con- clusions drawn from these synthesized data are speculative. The second issue related to multiyear estimates is that moving averages also present problems when used as dependent variables in several statistical models (such as time series models) and regression models, since the statistical properties of the data (such as autocorrelations) would be affected by the moving averages. Users should understand the implicit statistical assumptions in their analyses and be sure that the ACS data comply with these assumptions. For instance, if an analyst wanted to test the effect of gasoline prices on commuting modes for a small area (requir- ing multiyear averaging), he or she will not be able to use monthly, or even annual, gas price data effectively. The analyst will need to develop estimates of the independent variable in the same timeframe during which the ACS data are available. 4.6.4 Seasonality Analysis Using ACS ACS data are collected throughout the year, as opposed to at a single point in time like the cen- sus Long Form data, so it will be important for data users to remember that analyses of other data in conjunction with ACS data will need to reï¬ect the full year. Because seasonality is very interesting from a transportation planning perspective, as travel patterns can vary signiï¬cantly throughout the year, U.S. DOT has sponsored an analysis of sea- sonality using Hampden County, Massachusetts, data. For this guidebook, seasonality in two other ACS test counties, Broward County, Florida, and Pima County, Arizona, were analyzed using the evaluation datasets provided by FHWA and the Census Bureau. These datasets also were used for the comparisons described in Appendix I. The seasonality analyses that were per- formed with these data relied on information about the quarter of the year in which the data were collected. Since quarter data generally will not be available to ACS users, the results of these analyses are included in Appendix J. The key lesson from the analysis is that for some locations, seasonality will have an important effect on ACS results but, unfortunately, without information 70 A Guidebook for Using American Community Survey Data for Transportation Planning
on the time of year that the responses were obtained, we will not have much opportunity to address the issue. 4.6.5 ACS Continuity An important concern about replacing the census Long Form with the continuous ACS is that by separating the sample data collection from the constitutionally mandated census count, ACS is more likely than the Long Form to be cut back or eliminated during the governmentâs budg- eting process. Effect of Missing Data. Given the large standard errors of the ACS estimates, any further reduction in sample size will adversely impact the quality of its estimates, which will be reï¬ected in larger standard errors. The relationship between the sampling rate and the resulting standard error of the estimates is shown by the following equation: where S is the inverse of the sampling rate minus 1, YË is the estimate, N is the total count of people or housing units, and SE(YË ) is the standard error of YË . For example, if the sampling rate is cut by half, the resulting standard error is equal to times the original standard error. Sample size reduction due to potential budget cuts could have an effect on different phases of the ACS data collection program. The effect of eliminating the data collection, even for a single year, will be severe, and because of the multiyear averaging, will be long lasting. Effect of Making ACS Voluntary The Census Bureau evaluated the effects of making par- ticipation in ACS voluntary, rather than mandatory. In two data quality reports that analyzed this issue, the Census Bureau responded to questionnaire content and increasing public privacy concerns by evaluating the potential effects of having ACS implemented as a voluntary survey. These reports are â¢ Report 3: Testing the Use of Voluntary Methods and â¢ Report 11: Testing Voluntary MethodsâAdditional Results. To analyze the potential effects of this change, the Census Bureau performed a test using the March and April 2003 ACS sample. Four experimental mail treatments were used as follows: â¢ A mandatory treatment identical to the mandatory treatment that had been used previously, â¢ An alternative mandatory treatment that attempted to improve the user-friendliness of the mail survey, â¢ A standard voluntary approach similar to that used for other voluntary Census Bureau sur- veys, and â¢ A voluntary treatment that explicitly told respondents that the survey was voluntary. Voluntary methods also were applied to the telephone and in-person surveys. The responses to the different treatments were then compared with each other and with the year 2002 manda- tory treatment. S S 2 1 2 1 41â = . SE Y SY Y N Ë Ë Ë( ) = ââââ â â â1 Using ACS Data 71
Based on their analyses of the data, the analysts drew the following conclusions: â¢ A dramatic decrease (more than 20 percentage points) occurred in mail response when the standard survey was voluntary. â¢ The reliability of estimates was adversely impacted by the reduction in the total number of completed interviewsâproducing reliable results with voluntary methods would require an increased initial sample size. â¢ The decrease in cooperation across all three modes of data collection resulted in a notewor- thy, but not critical, drop in the weighted survey response rate. â¢ The estimated annual cost of implementing the ACS would increase by at least $59.2 million if the survey was voluntary and reliability was maintained. â¢ Levels of item non-response for the data collected under voluntary and mandatory methods were very similar. Although the differences in item non-response at the topic level were sta- tistically signiï¬cant, the item non-response rates were very similar. â¢ The use of voluntary methods had a negative impact on traditionally low-response areas that will compromise our ability to produce reliable data for these areas and for small population groups such as blacks, Hispanics, Asians, American Indians, and Alaska Natives. â¢ The change to voluntary methods had the greatest impact on areas that have traditionally high levels of cooperation and on white and non-Hispanic households. â¢ Compared to a standard voluntary survey, the use of a more direct presentation of the volun- tary message 1) resulted in an additional decrease of four percentage points in mail response and 2) had only a minor additional impact on data quality, with an additional 1.6 percent decrease in the interview rate and an additional 0.4 percent decrease in the survey response rate. â¢ Compared to the current mandatory treatment, the revised mandatory treatment, which was intended to be more user-friendly, resulted in only a slight increase in mail cooperation (increase of 1.9 percentage points). â¢ Although the mail check-in rates were much higher for the mandatory treatments than for the voluntary treatments, the overall patterns of mail responses over time were remarkably simi- lar across all four treatments. 72 A Guidebook for Using American Community Survey Data for Transportation Planning