Cover Image

Not for Sale



View/Hide Left Panel
Click for next page ( 49


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 48
48 A Guidebook for Using American Community Survey Data for Transportation Planning greater than 61 percent, then these modes would be collapsed to fewer categories according to predefined collapsed table definitions. If the median of covariances for the collapsed table still exceeds 61 percent, the table will be suppressed for County X. 4.4 Understanding, Working with, and Reporting Sample Data 4.4.1 ACS Sample Size The ACS questionnaire is sent to 250,000 housing units every month, or equivalently to 3 million housing units annually, drawn from all counties in the U.S. To allow data users to better analyze smaller areas, the Census Bureau applies differential sampling rates based on the area type. The 2005 sampling rates are shown in Table 4.9. In contrast, the decennial census Long Form was sent to about one of every six addresses. Since both the Long Form and ACS data represent samples of the overall population, they include some imprecision, or margin of error, in their estimates. 4.4.2 Sampling Error Sampling error is the term given to the error associated with deriving an estimate from a sam- ple rather than an entire population. ACS data are estimates of actual numbers or percentages in the population but because the data are not collected from the whole population, random sampling error will be present. The larger the sample size is, the smaller the sampling error will be but, of course, the specific amount of error in an estimate can only be known if information from the true population were available. Sampling error is most commonly estimated through the calculation of the standard error associated with the estimate. Standard error is a measure of the deviation of a sample estimate from the average of all possible similar samples. It is an indication of the precision with which a Table 4.9. ACS Sampling rates, 2005. Area Type Sampling Rate Category 2005 Final Sampling Rate Blocks in smallest sampling entities (estimated occupied housing units in block < 200) 10.0% Blocks in smaller sampling entities (estimated occupied housing units in block 200 6.9% and < 800) Blocks in small sampling entities (estimated occupied housing units in block 800 and 3.6% 1200) Blocks in large tracts (estimated occupied housing units in block > 1200 and estimated occupied housing units in tract > 2000) Mailable addresses 75% and predicted levels of completed interviews prior to 1.6% subsampling > 60% Mailable addresses < 75% and/or predicted levels of completed interviews prior 1.7% to subsampling 60% All other blocks (estimated occupied housing units in block > 1200 and estimated occupied housing units in tract 2000) Mailable addresses 75% and predicted levels of completed interviews prior to 2.1% subsampling > 60% Mailable addresses < 75% and/or predicted levels of completed interviews prior 2.3% to subsampling 60% Source: United States Census Bureau, Design and Methodology: American Community Survey, Technical Paper 67 (May 2006) U.S. Government Printing Office, Washington, D.C.

OCR for page 48
Using ACS Data 49 sample estimate approximates the population value. Formulas for calculating standard errors associated with sample estimates are straightforward, but since the Census Bureau will calculate and report the standard errors, the reader is referred to any standard statistics textbook for more details on these calculations. The sampling error of an estimate is usually summarized as a combination of a confidence level and a confidence interval. The confidence level is the percentage of times that drawing a sample of a particular size from a certain population will result in having the actual (but unknown) parameter of interest being within a certain confidence interval. For instance, a surveyor might report that based on survey results, sample size, and variance levels, the percent of households with zero vehicles for a certain population of households is 10 percent plus or minus 3 percent at the 95 percent confidence level. This means that 95 out of 100 times that we performed a survey with the same sample size, the estimate we determine in the survey--plus or minus 3 percent--will include the true percentage of zero vehicle households. For this example, the confidence level is 95 percent. The confidence interval is 6 percent and the margin of error is 3 percent. It is common for analysts to establish a confidence level for reporting and then to calculate the margin of error for the survey-derived estimates associated with that confidence level. The confi- dence levels selected are generally related to how much uncertainty researchers are able to accept in particular estimates. Medical and scientific researchers sometimes will specify 99 percent confidence levels or higher. Political polls seem to usually report margins of error assuming confidence levels of 95 percent or 90 percent. For a particular sample population and sample size, as confidence levels are increased, the corresponding margins of error around the sample estimates widen. Suppose a sample parameter is measured from a large sample to have a mean value of X and, based on the variation in the sample, the standard error is computed to be Y. The confidence intervals for different confidence levels are shown in Table 4.10. Both the decennial census Long Form and ACS are sample datasets, so sampling error will be present in estimates from either source. Despite this fact, one almost never sees precision levels reported for census Long Form estimates. Analysts generally report census Long Form estimates as single numbers. The Census Bureau does make the precision levels available to users, but most data users choose not to work with them. Not incorporating the uncertainty levels into analyses simplifies analyses, some of which are already fairly complicated. However, in practical applica- tion, this also has the effect that many users of the analyses do not understand the nature of these data. A common misconception of many consumers and users of these data is that they are cen- sus data and therefore are actually based on a 100 percent sample of the population (like the decennial census Short Form data). Because the ACS sample sizes are smaller than those of the Long Form, the sampling errors will be more significant for ACS, and the misconception that the estimates are completely precise is Table 4.10. Confidence intervals for a large sample parameter with a mean value X and a standard error Y. Confidence Interval Confidence Level Low High 80 percent X 1.28 * Y X + 1.28 * Y 90 percent X 1.65 * Y X + 1.65 * Y 95 percent X 1.96 * Y X + 1.96 * Y 99 percent X 2.58 * Y X + 2.58 * Y

OCR for page 48
50 A Guidebook for Using American Community Survey Data for Transportation Planning more likely to lead to erroneous conclusions. For this reason, the Census Bureau is making a con- certed effort to stress that ACS estimates are just that, statistical estimates, and not counts. The Census Bureau calculates the standard errors for all estimates reported in ACS data prod- ucts using procedures that account for the sample design and estimation methods. These procedures are described in the Census Bureau's Accuracy of the Data reports, which are updated annually (available at www.census.gov/acs/www/UseData/Accuracy/Accuracy1.htm). All ACS estimates are reported with margins of error or confidence intervals corresponding to the 90 percent confidence level. Using the reported estimates and upper and lower bounds, data users are able to incorporate ACS's sampling error into their analyses and data presentations. Example Calculations for Incorporating Sampling Error into ACS Analyses To help ana- lysts use and interpret the margin of error provided with the ACS estimates, the Census Bureau provides formulas and some example calculations to guide data users in the Accuracy of the Data reports. There are four example calculations from this source presented and annotated below. 1. Calculation of the standard error of an ACS estimate, 2. Calculation of the standard error of the sum (or difference) of ACS estimates, 3. Calculation of the standard error of the ratio of two ACS estimates, and 4. Calculation of the standard error of the proportion of an ACS total estimate in an ACS subto- tal estimate. Although these examples are for a generic analysis for a wider audience, the same procedures will be used by transportation planners in their most common analyses, as is demonstrated by the case study sections that follow in this guidebook. Example Calculation 1 Determine the standard error of a reported ACS estimate. Problem The ACS estimates the number of males in the United States that have never married to be 33,290,195. The reported lower bound of the estimate is 33,166,192, and the reported upper bound is 33,414,198. What is the standard error of the estimate of the number of males who have never married? Relevant Equations. Standard error = 90 percent confidence margin of error/1.65 Margin of error = max(upper bound estimate, estimate lower bound) Note: Many, but not all, ACS intervals are symmetrical around the reported estimate, so choosing the maximum interval is the conservative approach to establishing the margin of error. Calculations Margin of error = max(33,414,198 33,290,195), (33,290,195 33,166,192))= 124,003 Standard error = 124,003/1.65 = 75,153 Discussion The standard error calculation, in and of itself, may not be particularly edifying, but it is a first step that allows users to perform other calculations, like those shown below. Also, by knowing the standard error, analysts can establish upper and lower bound estimates for other confidence levels. For instance, the 95 percent margin of error is 1.96 * 75,153 = 147,300. Example Calculation 2 Determine the Standard Error of a Sum of Reported ACS Estimates. Problem As noted in the previous example calculation, the number of males that have never been married is estimated to be 33,290,195, with upper and lower bounds of 33,414,198 and 33,166,192. The estimate of the number of females that have never married is 29,204,857 with a

OCR for page 48
Using ACS Data 51 reported lower bound of 29,090,048, and a reported upper bound of 29,319,666. What is the estimated number of all people who have never married? Relevant Equations. Standard error (SE) of a sum ^ +Y SE( X ^ ) = [SE( X ^ )]2 + [SE(Y ^ )]2 Notes: The Census Bureau states that this method will underestimate the standard error if the items in a sum are highly positively correlated, and will overestimate the standard error if the items in the sum are highly negatively correlated. This equation also is valid for the standard ^ -Y error of the difference of ACS reported estimates: SE(X ^ ) = SE(X ^ +Y^ ). Calculations The point estimate of the number of people who have never married is 33,290,195 + 29,204,857 = 62,495,052. From the previous example, the standard error of the estimates for males is 75,153. Applica- tion of the same equation for females yields a standard error of 69,581. Therefore, the standard error of the sum is SE(62, 495, 052) = (75,153)2 + (69, 581)2 = 102, 418 Once the standard error of the sum has been calculated, analysts can calculate and report asso- ciated confidence intervals. The 90 percent confidence interval for the total number of people who have never married (based on equation in the first example) is (62,495,052-1.65(102,418)) to (62,495,052+1.65(102,418)), or 62,326,062 to 62,664,042 people. Discussion The summation of estimates propagates the sampling error inherent in the indi- vidual addend estimates, so the importance of evaluating and reporting the uncertainty in esti- mates derived in this manner is increased. Many census data users, including transportation planners, will frequently need to combine individual census estimates in this way to address their specific analysis needs. The detailed delineations in several of the transportation-related ACS tabulations will frequently require ana- lysts to sum individual estimates. For instance, ACS tabulations of commuting time of day break the day into very detailed day parts. To analyze longer periods, such as peak periods as opposed to peak hours, analysts will need to sum the time period components. Example Calculation 3 Determine the standard error of a ratio of reported ACS estimates. Problem Suppose the statistic of interest is the ratio of the number of women who have never married to the number of men who have never married. What is the ratio and the standard error of the ratio of females who have never married to males who have never married? Relevant Equations. Standard error of a ratio ^ 1 X ^2 SE = ^ )]2 + X [SE(Y [SE( X ^ )]2 ^ Y Y ^ ^2 Y Note: This approximation is valid for ratios of two estimates where the numerator is not a subset of the denominator.

OCR for page 48
52 A Guidebook for Using American Community Survey Data for Transportation Planning Calculations The equation inputs are calculated as shown above. 29, 204, 857 1 (29, 204, 857)2 SE = (69, 581)2 + (75, 513)2 = 0.29 percent. 33, 290,195 33, 290,195 (33, 290,195)2 The ratio of the two estimates is (29,204,857/33,290,195) = 87.73 percent, and the upper and lower bounds for the 90 percent confidence level are 87.73% 1.65*0.29% = 87.25% - 88.21% Discussion This example demonstrates a technique for evaluating how the sampling errors affect the calculation of ratios between two parallel estimates. A transportation-based example of this type of comparison would be if an analyst wanted to make a statement such as, "there are X times more two-vehicle households than zero-vehicle households in geographic area Y." These comparisons are not usually that useful for single-variable tables, but are very common and useful when analyzing cross-tabulations, where an analyst might want to say something like, "workers in zero-vehicle households are X times more likely to commute by transit than work- ers in two-vehicle households." The more common comparison between a subtotal estimate and its corresponding total estimate (e.g., "X percent of the households have zero vehicles") is covered in the next example calculation. Example Calculation 4 Determine the standard error of a percentage. Problem: Now, suppose the statistic of interest is the percentage of females who have never married in relation to the total number of people who have never been married. What is the per- centage of people who have never married that are women, and what is the standard error of the percentage? Relevant Equations. Standard error of a proportion: 1 ^2 ^) = SE( p ^ ))2 - X (SE(Y (SE( X ^ ))2 ^ Y ^2 Y Note: This approximation is valid for proportions of two estimates where the numerator (X) is a subset of the denominator (Y). Calculations The point estimate for the proportion of the total that are female is (29,204,857/62,495,052)100% = 46.73%. From the previous calculations, we know the standard error of the number of females who have never married is 69,581. The standard error for all people who have never married is 102,418. The standard error of the proportion is 29, 204, 857 1 (29, 204, 857)2 SE = (69, 581)2 - (102, 418)2 = 0.08 percent. 62, 495, 052 62, 495, 052 (62, 495, 052)2 The proportion is 46.73 percent, and the upper and lower bounds for the 90 percent confi- dence level are as follows: 46.73% 1.65 * 0.08% = 46.60% - 46.86% Discussion Determining the percentage that an ACS estimate makes up of an ACS estimated total will be a very common procedure for transportation planners and other census data users. For example, to calculate mode shares for different commuting modes, analysts will apply this procedure.