Cover Image

Not for Sale

View/Hide Left Panel
Click for next page ( 45

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 44
44 A Guidebook for Using American Community Survey Data for Transportation Planning many analyses, because the data are more relevant to the current period than the five-year estimate. For the towns, the five-year average estimates are the only available choice. There may be some instances where analysts would be willing to sacrifice currency of the esti- mates for the greater precision offered from the multiyear average's larger sample sizes and for the lesser volatility in the average estimates. The three- and five-year estimates tend to dampen the effect of year-to-year changes, so using the averages can help analysts avoid worrying about what may simply amount to random year-to-year noise. Unfortunately, on the other hand, the aver- aged estimates do not pick up real trends as strongly or as quickly as do single-year estimates. As can be seen by comparing the hypothetical data releases for the commuting issue at the town level to the top part of the table, the multiyear averages lag behind in identifying the increase trend. Decisions about which estimates to use become more complicated when the analyst needs to examine the variable across different geographic levels. Making comparisons between one-year estimates and multiyear estimates will be problematic if the variable of interest is trending one way or another during the multiyear period or if the variable in the most current year is different than for the previous years. Suppose the analyst wanted to know the percentage of long-distance commuters in the county that lived in one of the smaller towns. Dividing the five-year average estimates for the towns by the single-year estimate for the county is not really a valid approach, since the two measure the vari- able over different periods. The more appealing approach would be to use the five-year averages for both the towns and the county for this analysis. This "least common denominator" approach ensures that any regional changes during the averaging period are captured in all of the estimates. Typically, as a practical point and as in this example, for most variables and geographic areas, the differences between the estimates from the different averages will not be so large that they would materially affect policy decisions, but one could easily think of examples in rapidly chang- ing areas where differences in the single-year, three- and five-year averages, could affect the results of analyses in meaningful ways. To summarize the elements of choosing the particular ACS estimate to use in analyses, data users should consider the following: Is the anticipated analysis related to understanding the most recent conditions and identify- ing potential recent shifts in the population? To what level does the analysis need to be protected from potential random year-to-year noise in the estimates? Have there been any significant regional changes in the past few years that might make estimates that include both pre- and post-change ACS data less useful? Will the analyses involve multiple geographic levels for which the same types of ACS estimates might not be available? 4.3 Data Disclosure Limitations As noted in Section 2, before releasing any ACS data, the Census Bureau first edits the data- base to ensure it is within compliance with disclosure rules. The Census Bureau's DRB governs the release of census data as described below: Title 13 of the United States Code authorizes the Census Bureau to conduct censuses and sur- veys. Section 9 of the same Title requires that any information collected from the public under the authority of Title 13 be maintained as confidential. . . . The Census Bureau's internal Disclosure Review Board (DRB) sets the confidentiality rules for all data releases.23 23 See

OCR for page 44
Using ACS Data 45 4.3.1 Data Disclosure Avoidance Three types of data disclosure avoidance procedures are expected to be applied to the ACS data with varying effects on data utility: imputation, rounding, and data suppression. Imputation The confidentiality edit is implemented by selecting a small subset of indi- vidual households from the internal sample data files and blanking a subset of the data items on these household records. Responses to those data items are then imputed using the same imputation procedures used for non-response. A larger subset of households is selected for the confidentiality edit for small areas to provide greater protection for these areas. The edit- ing process is implemented in such a way that the quality and usefulness of the data are preserved.24 Rounding For the most common decennial census sample data products, a small amount of uncertainty was introduced into the estimates of census characteristics. The sample itself provided adequate protection for most areas for which sample data are published since the resulting data are estimates of the actual counts; however, small areas required more protection. For CTPP 2000 and other similar projects for which detailed cross-tabulation data for small geographic areas are reported, the Census Bureau enhances confidentiality further by rounding the reported estimates and by establishing minimum response thresholds. The DRB issued a memorandum on December 11, 2001 stating the following rules: For Part 1 data (place of residence), and Part 2 data (place of work), all published values will be rounded as follows: Zero rounds to zero, One through seven rounds to four, and All other numbers round to the nearest multiple of five. Numbers ending in zero and five are not rounded. For Part 3 (journey-to-work flows), the DRB allows two tables to be published with no record threshold. These include the following: Table 3.1 (Total Workers); and Table 3.2 (Vehicles Available by Means of Transportation to Work). Added to this set were tables of aggregates, means, and medians that were rounded according to specifications used for all sample data products. For all other Part 3 tables (Tables 3.3 through Table 3.7 in CTPP 2000), the DRB set a three-unweighted record threshold. A key issue to note in the analyzing effects of disclosure rules is the application of independ- ent rounding. Total columns in each table may not match the sum of the categories because totals are rounded independently of the cells, as shown in the rounding example in Table 4.6. However, because some variables (e.g., travel mode to work) are classified in more than one way depending on which table one uses, the number of possible answers is higher than just two. Up to 15 estimates for the number of transit commuters may be possible25 when different tables and different geographies are analyzed, as shown in the example in Table 4.7. While there are very few one-dimensional tables in CTPP 2000, having one-dimensional unrounded tables in ACS would be of great use in establishing control totals as checks for ana- lysts and as inputs to iterative proportional fitting processes. 24 See 25 Chuck Purvis, Metropolitan Transportation Commission, Oakland, California, e-mail posted to the CTPP news listserve on February 19, 2004.

OCR for page 44
46 A Guidebook for Using American Community Survey Data for Transportation Planning Table 4.6. CTPP's independent rounding of table cells and totals. Sample Table Data Rounded Value Using Rounding Rules 0 vehicle households = 6 4 1 vehicle households = 14 15 2 vehicle households = 8 10 3 vehicle households = 8 10 4 vehicle households = 3 4 Incorrect Total Rounded Value = 4+15+10+10+4=43, which is rounded to 45. Correct Total Rounded Value = 6+14+8+8+3 = 39, which is rounded to 40. Source: U.S. Census Bureau CTPP 2000 data. Table 4.7. CTPP rounding: estimates of transit commuters for different geographic summary levels and CTPP Tables. Table 2-2 Table 2-12 Table 2-27 Transit = Transit = Transit = Summary Level Category Number of Levels 5 categories 3 categories 2 categories TAZ 4,031 319,345 319,553 319,600 Block Group 4,384 319,433 319,521 319,541 Tract 1,403 319,717 319,780 319,836 County 9 320,116 320,129 320,125 MPO 1 320,125 320,120 320,120 Source: U.S. Census Bureau CTPP 2000 data. A comparison of CTPP 2000 data and Summary File 3 (SF 3)26 data shows that the CTPP esti- mates were more likely to be lower than the SF 3 values.27 The effect of rounding a value from one through seven to a value of four generally provided a lower estimate than the actual value. Rounding as conducted for CTPP 2000 does not affect the statistical significance of the data. However, it does cause a number of distortions while aggregating geography. It is important to minimize the number of geographies that are combined. For example, if the CBD can be defined using tract geography, tracts should be used rather than the more finely defined TAZs. For both the Census 2000 and ACS datasets supplied by the Census Bureau for the research discussed in Appendix I and Appendix J, the data were subject to further disclosure scrutiny.28 All estimates were rounded by intervals of 10, rather than intervals of 5. The new rounding rules applied to the ACS research data compound the problems seen for CTPP 2000 and resulted in a significant loss of journey-to-work trip flow data. Data Suppression In addition, for a CTPP-like product from the ACS, the Census Bureau would establish minimum response thresholds for some of the flow tables. Although rounding is a significant data issue, applying thresholds to journey-to-work flow data will almost certainly eliminate journey-to-work flows for small geography. Table 4.8 shows pairs of geographies 26 Summary File 3 consists of 813 detailed tables of Census 2000 social, economic, and housing characteristics compiled from a sample of approximately 19 million housing units (about 1 in 6 households) that received the Census 2000 Long Form questionnaire. 27 Nanda Srinivasan, Cambridge Systematics Inc., "Data Rounding in CTPP 2000" CTPP Status Report, April 2004. 28 Correspondence with Phillip Salopek, U.S. Census Bureau, on July 23, 2004.

OCR for page 44
Using ACS Data 47 Table 4.8. Disclosure effect on Census 2000 versus ACS, Hampden County. Part 3: Without Thresholds Part 3: With Thresholds Part 1 Total Geographic Total Workers Total Geographic Total Workers Pairs with Reported with Reported Pairs with Reported with Reported Total Data Work Flows Work Flows Work Flows Work Flows Workers Census 2000 8,228 207,120 2,644 147,080 199,220 ACS 6,368 181,563 1,673 118,234 202,024 Source: FHWA CTPP Status Report, April 2004. tabulated with and without the disclosure rules for Hampden County, Massachusetts, for both the ACS and Census 2000 at the census tract level. It can be seen that without thresholds, and allowing for a 15 percent sampling rate, the ACS data produces about three-fourths (6,368/8,228) of the number of origin-destination pairs pro- duced by the Long Form. The new rounding rules would affect ACS data more significantly, with only 90 percent (181,563/202,024) of total workers being reported due to rounding. For tables subject to thresholds, applying the same rules to both the ACS and the Census 2000 data, the number of pairs in ACS is still about 75 to 80 percent (1,673/2,644) of the origin-destination pairs shown in the census Long Form. However, the number of workers in ACS drops further down to about 60 percent (118,234/202,024) of the total workers in Hampden County. There is a growing concern within the U.S. transportation community that the Census Bureau will continue to use the same rounding and threshold rules for all future origin-destination tables produced from ACS. The effect of these rules would be significant at the census tract level--in the example, producing 1,673 origin-destination pairs when the census Long Form (without disclo- sure rules) would have produced 8,228 origin-destination pairs, accounting for only 60 percent of workers living in the county. It also is expected that a similar or even more severe loss of flow data will occur at the TAZ level. One implication of this is that transportation analysts might have to resort to aggregating their TAZs into larger geographies for which sufficient flow data are available. Some researchers have pointed out that if blocks are aggregated so that they are larger than walking distance to a bus stop, for example, then the geography aggregation causes the survey data to become of less value for bus route planning. Similarly, if the aggregate geography is larger than the distance between highway exits, then the survey data cannot be used for highway cor- ridor analysis. Because the ACS is sampled over time, and at any point in time has less sample size than Cen- sus 2000, it may be desirable to have less stringent disclosure rules for ACS. The United King- dom (UK), for example, has a higher sampling rate and the data are subject to fewer disclosure rules. The census in the UK is conducted solely via the Long Form with about 24 million house- holds surveyed--almost equivalent to the Census 2000 Long Form in the United States. The suite of standard data products from the UK is quite extensive.29 Table Variable Collapsing As for the 2000 CTPP, the Census Bureau expects to apply qual- ity control measures on ACS data products for geographic areas for which categorized tables could be misinterpreted. Any table whose median distribution of covariance for individual cell values is greater than 61 percent will be modified or suppressed for that geography. For example, for a given geography for County X, for a table with 18 means of transportation, individual covariances are calculated for estimates of workers who drove alone, carpooled, etc. If the median of these covariances is 29