PART II
Technical Issues



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 139
Using the American Community Survey: Benefits and Challenges PART II Technical Issues

OCR for page 139
Using the American Community Survey: Benefits and Challenges This page intentionally left blank.

OCR for page 139
Using the American Community Survey: Benefits and Challenges 4 Sample Design and Survey Operations The panel is impressed by the extent of research and development that the Census Bureau has devoted to the design and operation of the American Community Survey (ACS) throughout 10 years of testing and partial implementation. Given that the ACS has just been fully implemented and given its complex design, it will continue to require a high level of research, evaluation, and experimentation that can not only inform users and ACS managers, but also lead, as appropriate, to modifications that increase the quality and usefulness of the data and the efficiency of the survey operations. Such research needs to systematically evaluate various aspects of the survey in the context of full implementation and also to address unforeseen problems that may arise in data collection and processing. The sections of this chapter address the following specific aspects of the ACS sample design and operations that, in the panel’s judgment, require continued research, evaluation, and experimentation by the Census Bureau: Sampling operations for housing units, including initial sampling using the Master Address File (MAF) as the sampling frame and subsampling for nonresponding housing units; Data collection for housing units, including mode of data collection and residence rules; Sampling and data collection for group quarters; and Data preparation, including confidentiality protection, collapsing of tables for large sampling errors, inflation adjustments, tabula-

OCR for page 139
Using the American Community Survey: Benefits and Challenges tion specifications with respect to the universe and geographic areas for which various estimates are provided, and data quality review. In each section, descriptive information precedes a discussion of issues and the panel’s assessment. Weighting procedures are discussed tangentially; for a detailed discussion, see Chapter 5, which reviews the construction and interpretation of the ACS estimates for 12 months (1-year period estimates), and Chapter 6, which reviews the construction and interpretation of the ACS estimates for 36 months (3-year period estimates) and 60 months (5-year period estimates). A report from the U.S. Government Accountability Office (2004) discusses some of the same ACS issues as this report, including residence rules, methods for deriving independent population and housing controls, inflation adjustments for dollar amounts, and understanding the ACS multiyear period estimates. Chapters 4, 5, and 6 necessarily emphasize aspects of the ACS that appear to be or may be problematic and hence require continued research and evaluation. Readers should keep in mind the substantial benefits of the ACS in comparison with the 2000 long-form sample that are spelled out in Chapters 2 and 3. These benefits include timeliness, frequency of updating, improved data quality in terms of completeness of response, and consistency of measurement with the long-form sample for most items. 4-A SAMPLING OPERATIONS FOR HOUSING UNITS This section briefly describes the development of the initial ACS sample of housing units from the MAF (4-A.1), sampling rates for the initial sample (4-A.2), and subsampling rates for nonresponse follow-up (4-A.3). It then outlines the panel’s concerns and recommendations for the MAF (4-A.4) and the ACS sample size and design (4-A.5). 4-A.1 Developing the Initial Sample The initial ACS sample of housing unit addresses in the 50 states and the District of Columbia for 2005 and subsequent years consists of approximately 250,000 housing units per month and approximately 3 million housing unit addresses for the year (about 2.3 percent of 129.5 million housing units on the MAF in 2005).1 The initial sample—that is, the sample before subsampling for nonresponse follow-up by computer-assisted per- 1 Refer back to Box 2-1 for a brief description of sampling and other procedures in the Puerto Rico Community Survey; for further information about the housing unit sampling procedures in the United States and Puerto Rico, see Asiala (2004, 2005); Hefter (2005a); U.S. Census Bureau (2006:Ch. 4).

OCR for page 139
Using the American Community Survey: Benefits and Challenges sonal interviewing (CAPI)—is selected using systematic sampling from the MAF so that each monthly sample is spread throughout the United States in an unclustered way. The initial sampling occurs in two phases (see Box 4-1): subdivision of the MAF into yearly segments (first phase) and selection of addresses for the ACS sample for each data collection year from the applicable segment (second phase). The first phase is designed to allocate housing unit addresses on the MAF to five equal segments, each of which is assigned to year t1 through year t5 of specified 5-year periods, which are 2005–2009, 2010–2014, 2015–2019, and so on. The addresses in each segment are eligible to be selected for the ACS sample only for the years to which they are assigned (for example, 2005, 2010, 2015, and so on for t1 addresses; 2006, 2011, 2016, and so on for t2 addresses). This segmentation procedure ensures that no address will be included in the ACS sample more than once every 5 years. The first-phase segmentation of MAF addresses proceeds on a continuous basis in two waves each August and January. The process began in August 2004 when all of the housing unit addresses on the MAF at that time were assigned to equal segments for years 2005–2009. Then, each January and August, newly added addresses are assigned equally to one of the five segments for the period then in progress: for example, new addresses identified in January 2006 were assigned equally to years 2005–2009. In August 2009, all addresses assigned to segments for 2005–2009 that still exist as housing unit addresses on the MAF at that time will be reassigned to the same segments for 2010–2014, and the process of assigning newly added addresses to segments for these 5 years will proceed each January and August until August 2014, when the process will begin anew. The assignment to the five yearly segments is carried out using systematic sampling of addresses, which are arranged in each county by sampling rate stratum (see below) and geographical location. The second-phase sampling is designed to select ACS sample addresses from a given year’s first-phase segment to meet specified sampling rates that are chosen to improve the precision of estimates for small governmental units. The second-phase sampling proceeds in two stages, corresponding to the stages of first-phase sampling. In August of year t – 1, a main sample is selected from the segment of MAF addresses assigned to year t; then, in January of year t, a supplemental sample is selected from newly added MAF addresses assigned to year t’s segment. The main sample addresses are assigned equally to the 12 months of year t for data collection, while the supplemental sample addresses are assigned equally to months April–December of year t for data collection.

OCR for page 139
Using the American Community Survey: Benefits and Challenges BOX 4-1 Developing the Initial ACS Sample, Phases One and Two, Area X with 20,000 Housing Units (50,000 People) Phase One: Allocate Master Address File (MAF) Housing Unit Addresses to Five Segments August 2004: 1. MAF addresses for Area X: 20,000 housing unit addresses   1a. Divide into 5 equal segments: 4,000 addresses each   Segment 1: 2005 Segment 2: 2006 Segment 3: 2007 Segment 4: 2008 Segment 5: 2009   January 2005:   2. Newly added MAF addresses for Area X: 500 housing unit addresses   2a. Divide into 5 equal segments as above: 100 addresses each August 2005:   3. Newly added MAF addresses for Area X: 625 housing unit addresses   3a. Divide into 5 equal segments as above: 125 addresses each January 2006:   4. Newly added MAF addresses for Area X: 250 housing unit addresses   4a. Divide into 5 equal segments as above: 50 addresses each August 2006:   5. Newly added MAF addresses for Area X: 100 housing unit addresses   5a. Divide into 5 equal segments as above: 20 addresses each January 2007:   6. Newly added MAF addresses for Area X: 100 housing unit addresses   6a. Divide into 5 equal segments as above: 20 addresses each And so on … until: August 2009: MAF addresses for Area X, including all additions, demolitions, and ineligible units:   Divide into 5 equal segments   Segment 1: 2010 (addresses previously assigned to 2005) Segment 2: 2011 (addresses previously assigned to 2006) 4-A.2 Initial Sampling Rates Initial sampling of housing unit addresses from the applicable segment for a data collection year each August and January (prior to nonresponse follow-up subsampling) uses one of five different sampling rates for the addresses within each geographic block (an area of, on average, about 15–20 housing units). The five sampling rates pertain to five strata containing the following kinds of blocks:

OCR for page 139
Using the American Community Survey: Benefits and Challenges   Segment 3: 2012 (addresses previously assigned to 2007) Segment 4: 2013 (addresses previously assigned to 2008) Segment 5: 2014 (addresses previously assigned to 2009) January 2010:   Newly added MAF addresses for Area X Divide into 5 equal segments as above, and so on … MAF Addresses in Segments 1-3 After Phase One, August 2004, 2005, 2006 Assumes no demolitions or ineligible units.   7. Segment 1 - August 2004 (line 1a): 4,000 addresses   8. Segment 2 - August 2005 (lines 1a + 2a + 3a): 4,225 addresses   9. Segment 3 - August 2006 (lines 1a + 2a + 3a + 4a + 5a): 4,295 addresses Phase Two: Select Housing Unit Addresses for Each Year’s ACS Sample from Applicable Segment Assume sampling rate is 11.5 percent (2.3 percent times 5—see Section A.2). 2005 ACS:   August 2004: Draw main sample from Segment 1 (line 7) 460 units   January 2005: Draw supplemental sample from Segment 1 (line 2a) 12 units   TOTAL ACS sample for 2005 472 units 2006 ACS:   August 2005: Draw main sample from Segment 2 (line 8) 486 units   January 2006: Draw supplemental sample from Segment 2 (line 4a) 6 units   TOTAL ACS sample for 2006 492 units 2007 ACS:   August 2006: Draw main sample from Segment 3 (line 9) 494 units   January 2007: Draw supplemental sample from Segment 3 (line 6a) 2 units   TOTAL ACS sample for 2007 496 units And so on … NOTE: Phase Two initial sampling rates will be reduced as the size of the MAF grows to maintain the overall ACS initial sample size of about 3 million housing unit addresses. Blocks in the smallest governmental units that are eligible for oversampling (refer back to Tables 2-3 and 2-4)—defined as eligible governments with an estimated fewer than 200 occupied housing units; Blocks in smaller governmental units—defined as eligible governments with an estimated 200 to fewer than 800 occupied housing units;

OCR for page 139
Using the American Community Survey: Benefits and Challenges Blocks in small governmental units—defined as eligible governments with an estimated 800 to fewer than 1,200 occupied housing units; Blocks in large census tracts in large governmental units—defined as census tracts with an estimated more than 2,000 occupied housing units; and All other blocks. The designation of initial sampling rates is based on estimates of occupied rather than total housing units because blocks in governmental units or census tracts with large numbers of seasonally vacant housing units would be undersampled if total housing units were the criterion. The estimates of occupied housing units are obtained from the current MAF address count times an estimate from the 2000 census of the proportion of occupied housing units for blocks in the governmental unit or census tract. These estimated proportions will presumably be updated at each census.2 Initial sampling rates are calculated for each of the five strata that will produce approximately equal precision for estimates of a given characteristic for small governmental units and large census tracts outside these units and keep the overall initial ACS sample at about 3 million housing unit addresses each year. A budget constraint necessitates that the initial sampling rate be reduced for some census tracts in order to pay for a higher level of CAPI nonresponse follow-up in tracts with lower-than-average response by mail and computer-assisted telephone interviewing (CATI). For this purpose, the initial sampling rate is reduced by 8 percent for census tracts in strata 4 and 5 (see above) for which at least 75 percent of addresses are mailable and it is projected that at least 60 percent of responses will be obtained by mail or CATI. For 2005, the initial (and reduced initial) overall sampling rates for the five strata were as follows: 1. blocks in the smallest governmental units eligible for oversampling: 10 percent; 2. blocks in smaller governmental units: 6.9 percent; 3. blocks in small governmental units: 3.5 percent; 4a. blocks in large census tracts in which sample reduction not made: 1.7 percent; 2 In Alaska Native and American Indian areas, blocks are assigned to a stratum by applying the estimated percentage of the population that is Alaska Native and American Indian to the estimate of occupied units for the block; the purpose of this procedure is to boost the sample in Alaska Native and American Indian areas.

OCR for page 139
Using the American Community Survey: Benefits and Challenges 4b. blocks in large census tracts in which sample reduction made as above: 1.6 percent; 5a. all other blocks in tracts in which sample reduction not made: 2.3 percent; and 5b. all other blocks in tracts in which sample reduction made as above: 2.1 percent. The second-phase sampling of housing unit addresses for a data collection year uses the above sampling rates multiplied by 5 to allow for the fact that only one-fifth of the addresses in the MAF are included in the first-phase segment for that year. For example, 50 percent of the addresses in the first-phase segment for blocks in category 1 above will be sampled (10 percent multiplied by 5), as will 34.5 percent for blocks in category 2, 17.5 percent for blocks in category 3, and so on. In years after 2005, the 2005 initial sampling rates will be reduced as necessary to maintain an overall initial sample size of about 3 million housing unit addresses. The exception is that no reduction will be made in the sampling rate for stratum 1. 4-A.3 Subsampling for CAPI Follow-up Even though response to the ACS, like the decennial census, is mandatory, the Census Bureau has never expected that as high a proportion of housing units sampled in the ACS would return their questionnaires by mail as occurs in the publicity-rich environment of the census. In order to reduce costs, the Census Bureau planned from the beginning to use CAPI to collect data for a subsample of nonresponding ACS sampled housing units instead of all of them. Before drawing the subsample, the Census Bureau planned to try to collect data by telephone using CATI for as many as possible of the sampled housing units not responding by mail. The Census Bureau specified three CAPI subsampling rates to apply to housing unit addresses that were mailed a questionnaire but did not respond by mail or CATI (see Hefter, 2005a, for details): addresses in census tracts with predicted levels of mail and CATI responses between 0 and 35 percent: 50.0 percent (1 in 2); addresses in census tracts with predicted levels of mail and CATI responses between 36 and 51 percent: 40.0 percent (2 in 5); and addresses in other census tracts: 33.3 percent (1 in 3). In addition, two-thirds (66.7 percent) of nonmailable addresses and addresses in remote Alaska are followed up in the CAPI operation. The higher (lower) rates are used to subsample nonresponding housing units in census tracts with predicted lower (higher) levels of mail and

OCR for page 139
Using the American Community Survey: Benefits and Challenges CATI response in order to roughly equalize the precision of the estimates for areas with differing levels of predicted mail and CATI response rates. The predicted levels for the 2005 subsampling operation were developed from mail response rate information from the Census 2000 Supplementary Survey and the 2001–2003 ACS test surveys when available and otherwise from a model that included data from the 2000 census; ACS mail and CATI response rate information will be used for all areas in the future. 4-A.4 MAF Concerns and Recommendations The MAF plays a critical role as the sampling frame for the ACS. It is the Census Bureau’s inventory of known residential addresses (housing units and group quarters) and selected nonresidential units in the United States and Puerto Rico. It contains mailing and location address information and other attribute information about each address. It also contains geographic codes, such as county and place codes, obtained by linking to the Census Bureau’s Topologically Integrated Geographic Encoding and Referencing (TIGER) database. For purposes of sampling housing unit addresses for the ACS, the following types of housing unit records are currently included in the ACS version of the MAF (see U.S. Census Bureau, 2006:Ch. 3): housing units in existence in the 2000 census and those added from the postcensus program to resolve challenges by localities to their population counts (the count question resolution program); new housing units added from semiannual updates of the U.S. Postal Service’s (USPS) Delivery Sequence File (DSF), along with housing units that were deleted in the 2000 census but continue to appear on the DSF;3 new housing units added from ongoing listings of addresses in areas of new construction that are conducted for the Census Bureau’s other household surveys; and new housing units added from the Community Address Updating System (CAUS), which annually lists addresses in about 20,000 blocks, out of a total of 750,000 largely rural blocks, where use of the DSF does not provide adequate coverage. Corrections to housing unit addresses are obtained from all of the above updating programs and from ACS interviewers. Because the ACS is a continuous monthly survey nationwide, it is es- 3 To the extent that demolished housing units are not systematically deleted from the DSF, then the retention of housing units that remain on the DSF but were deleted from the 2000 census MAF may result in unnecessary follow-up costs in areas with heavy demolitions.

OCR for page 139
Using the American Community Survey: Benefits and Challenges sential that its sampling frame—the MAF—be as complete and accurate as possible and that it be updated on a continuous basis in all areas of the country. The panel is concerned about the quality of the MAF updating, not only in areas with city-style addresses (house number and street name—see Section A.4.a below), but also in rural areas (see Section A.4.b; see also National Research Council, 2004a, which raises many of the same points). 4-A.4.a The MAF in Urban Areas The MAF updating for city-style-address areas between censuses depends almost entirely on the USPS DSF, for which the Census Bureau receives updated versions every 6 months. The DSF is a mail delivery file and is not meant to be a complete address list. Research conducted prior to the 2000 census indicated that the DSF is deficient as a source for the MAF in urban areas in at least three respects (see U.S. General Accounting Office, 1998:17-18): The DSF misses many addresses in new construction areas, where it takes time to establish separate mailboxes and mailing addresses. Portions of the DSF are not updated at the same rate all around the country. The DSF often does not clearly identify addresses in small multi-unit structures—in many of these units, mail may be delivered to a central hall or desk and not to the individual apartments. These deficiencies in the DSF led to a decision by the Census Bureau for the 2000 census to conduct a complete canvass of all 8.2 million blocks in 1999 in order to bolster the completeness of the 2000 Decennial Master Address File. Previously, the Census Bureau had planned to conduct a complete canvass only in rural areas and to spot-check addresses in urban areas. For the 2010 census, the Panel on Research on Future Census Methods (National Research Council, 2004a) recommended partnerships with state, local, and tribal governments to collect address list and geographic information throughout the decade in order to reduce the need for block canvassing in 2009, but such partnerships were not developed. Instead, the Census Bureau plans to repeat the very costly complete block canvass operation in 2009. It also plans to conduct a Local Update of Census Addresses (LUCA) program in 2008, in which local governments are given the opportunity to review and update the residential address listings for their jurisdiction, similar to a LUCA program conducted just prior to the 2000 census.4 4 The 2000 LUCA program experienced scheduling and communication problems, and participation was spotty across the country (National Research Council, 2004b:145).

OCR for page 139
Using the American Community Survey: Benefits and Challenges ponent of the ACS—for example, the possible deletion of institutions from the ACS universe—that would be cost-beneficial for users and stakeholders. 4-D DATA PREPARATION This section briefly describes key procedures to prepare the ACS data products, including confidentiality protection measures (4-D.1), the collapsing of tables because of large sampling errors (4-D.2), inflation adjustments of income and housing value and costs (4-D.3), tabulation specifications with respect to the population universe and geographic areas for which various estimates are provided (4-D.4), and data quality review (4-D.5). Recommendations for research and development on these topics are contained within the applicable subsection. 4-D.1 Confidentiality Protection 4-D.1.a Confidentiality Protection Procedures The Census Bureau uses three primary methods of disclosure avoidance to minimize the risk that someone could identify an individual respondent in the ACS data products: data swapping, categorizing variables, and top-coding. The first two methods are used for tabulations; all three methods are used for the ACS public use microdata sample (PUMS) files. The PUMS files also protect confidentiality by deleting names and addresses from the individual records, limiting geographic identification to large areas containing about 100,000 people called public use microdata areas (PUMAs), and perturbing the ages of people in households with 10 or more members. In addition, the subsampling for generating the PUMS files affords protection even if one knows a person who was in the full ACS sample because one does not know whether the person is in the PUMS subsample. Data swapping occurs when a household has rare characteristics (such as being the only minority household in a block group). In such instances, the entire household may be swapped with a demographically similar household in a different geographic region. Only a small percentage of households are ever swapped, and they are never identified. The purpose of swapping is to ensure that users will not be able to identify a household with certainty. All data products are created from the ACS records after swapping. Categorizing variables refers to collapsing categories of a variable within a table, or on the PUMS records, to avoid small cell sizes. For example, a table may combine some race categories, such as races other than white and black, into a single category, or a table may combine income

OCR for page 139
Using the American Community Survey: Benefits and Challenges amounts into intervals of $10,000 or more, with a wide top category, such as $100,000 or more. Top-coding refers to assigning a value to an individual record that is the same as that assigned to other individuals, all of whom have actual values above a specified limit. For example, all individuals with wages and salaries of $100,000 or more might be assigned the value of $100,000. Top codes for the ACS PUMS files are developed separately according to the distribution of responses by state.9 4-D.1.b Confidentiality Protection Concerns The panel strongly supports the protection of respondents’ individual information, because a breach of confidentiality would not only undercut the Census Bureau’s ability to collect information, but also break trust with respondents. At the same time, the panel is concerned that confidentiality protection not be ratcheted up without a careful consideration of the need not only to minimize disclosure risk, but also need to provide useful information for public- and private-sector decision making, research, and analysis. It is not possible to reduce the risk of disclosure to zero; the goal instead must be to minimize risks while not unduly suppressing valuable information. Microdata Products A recent report of a panel of the Committee on National Statistics, Expanding Access to Research Data (National Research Council, 2005), addresses issues in balancing confidentiality and privacy protection with obtaining an adequate return on taxpayers’ investment through providing users with access to rich microdata sets from government surveys. The report recommends research on techniques for providing useful, innovative public-use microdata sets that increase informational utility without increasing disclosure risk. In the context of ACS microdata, the panel encourages the Census Bureau to revisit its decision not to include month of data collection on the PUMS as a confidentiality protection measure. Given that individual PUMS records are not identified geographically for areas with fewer than 100,000 people, it could be argued that omitting month of data collection is not necessary to protect confidentiality. Including this variable on the PUMS files would be immensely valuable for analytical purposes in light of the moving ACS reference period. For instance, knowing the month of data collection would permit data users to make their own adjustments for inflation for income amounts (see Section 4-D.3 below). It would also fa- 9 See, for example, http://www.census.gov/acs/www/Products/PUMS/C2SS/minmaxval4.htm.

OCR for page 139
Using the American Community Survey: Benefits and Challenges cilitate research on seasonal variations in population. If, upon investigation, it appears too risky to include the exact month of interview, then perhaps the value could be perturbed within a range of plus or minus a month (for example, a month of interview labeled as “March” might actually have occurred in February or April). Multiyear Estimates The panel thinks that the continuous design of the ACS affords a measure of protection for respondents that the Census Bureau should take into account when considering appropriate confidentiality protections for multiyear estimates for small areas. The U.S. population is highly mobile with respect to geographic location, employment, family composition, commuting patterns, and other characteristics within and across years. Thus, the fact that 60 months of data are averaged to provide 5-year period estimates for block groups, census tracts, and small governmental jurisdictions should go a long way toward protecting individual respondents, even without additional steps to protect confidentiality. The Census Bureau, of course, will not, and should not, rely solely on averaging as a protection, but it should carefully consider the costs and benefits of each additional protection procedure and conduct research to identify the most useful protection techniques. In this regard, the Census Bureau should consider developing selected tables with reasonably precise estimates for seasonal populations (for example, winter and summer residents) for geographic areas that experience seasonal population changes. Thought would need to be given to whether appropriate population controls can be developed for such tables or whether to use controls at all. In addition, the Census Bureau should conduct research to determine an appropriate number of cases that need to be in the sample for a table or table cell to be released. To date, the Census Bureau appears to be using rule-based procedures for determining which tables must be deleted from publication in order to protect confidentiality. For example, the Census Bureau has developed rules for publication of worker and journey to work tabulations for traffic analysis zones and other geographic areas (Zayatz, 2005). Some of these rules appear to be reasonable, but others appear to lack a rationale. One of these rules is that an area must have at least 10 unweighted or 60 weighted cases of workers in the sample over the year for 1-year workplace tables to be published. For 3-year and 5-year workplace tables, the corresponding minimums are, respectively, 30 unweighted or 180 weighted cases of workers in sample over the last 3 years and 50 unweighted or 300 weighted cases of workers in sample over the last 5 years. In other words, the average minimums, year by year, are the same—namely, 10 unweighted or 60 weighted cases. Assuming the 1-year period estimate minimums are

OCR for page 139
Using the American Community Survey: Benefits and Challenges reasonable, then having the same average yearly minimums for the 3-year and 5-year period estimates makes sense: even though the 3-year and 5-year period estimates are published for smaller geographic areas than the 1-year period estimates, they represent averages over longer periods of time. A second rule is that 5-year period estimates of mode of transportation to work cross-tabulated by another variable will not be published for an area for a particular mode unless it has at least 3 unweighted workers in the sample. If it does not, then the mode must be collapsed with other modes to reach the minimum sample size requirement. Such a restriction is not imposed on the 1-year or 3-year period estimates. Given the skewed distribution of mode of transportation in the United States, whereby three-fourths of the population drives alone to work, another 10 percent carpools, and very small percentages take public transit, bicycle, walk, or work at home, this restriction may curtail the publication of needed information on transportation to work in many areas. In turn, such curtailment will handicap users who want to aggregate data for traffic analysis zones into larger areas of their own definition. The reason for the restriction for 5-year period estimates is not clear. Mode of transportation to work is highly variable: the same individual may decide to walk to work in the summer and drive in the winter or may walk to work for 4 years and then decide to switch to a new bus line or vice versa. Collectively, the workers in a traffic analysis zone are unlikely to include the same individuals over the 5-year period because of changes in residence and employment. The Census Bureau has time before 5-year period estimates become available in which to develop appropriate confidentiality protection strategies and techniques for transportation tables and other data products. Such strategies should seek to minimize disclosure risk in ways that recognize the protection afforded by averaging over 60 months of data. When developing confidentiality protection procedures for cross-tabulations, the Census Bureau should also, whenever possible, prefer procedures that make it possible to aggregate the data for smaller units into larger units. Thus, instead of suppressing cells of a cross-tabulation, it might be possible to use techniques that perturb the data for individual cells while preserving the marginal totals for each variable. 4-D.1.c Confidentiality Protection Recommendations Recommendation 4-8: Because of the potential value of month of data collection for analysis of the ACS PUMS, the Census Bureau should revisit its decision to omit this variable as a confidentiality protection measure. If further research determines that including exact month of data collection would significantly increase disclosure risk, the Census

OCR for page 139
Using the American Community Survey: Benefits and Challenges Bureau might consider perturbing the month of data collection or taking other steps to protect confidentiality. Similarly, the Census Bureau should consider developing selected summary tables that identify the season of collection (such as winter and summer) for geographic areas for which such information would be useful. Recommendation 4-9: The Census Bureau should undertake research to develop confidentiality protection rules and procedures for tabulations from the ACS that recognize the protection afforded to respondents by pooling the data over many months. Whenever possible, the Census Bureau should prefer confidentiality protection procedures that preserve the ability to aggregate smaller geographic areas into larger, user-defined areas. 4-D.2 Collapsing Tables for Large Sampling Errors In addition to procedures to protect confidentiality, the Census Bureau applies collapsing (or suppression) rules to the ACS 1-year and 3-year period standard tabulations that are designed to reduce the dimensions of tables, or to eliminate whole tables, that do not meet minimum standards for precision of the estimates. These collapsing rules are not applied to the 5-year period tabulations, even though the estimates will be very imprecise for small areas, because the small areas are intended to be building blocks for larger, user-defined areas. The rules for determining which tables, or categories of tables, need to be suppressed involve examining the standard errors of every cell of a tabulation for individual tabulation areas (U.S. Census Bureau, 2006:13-10 to 13-11). For a specified table and area, the coefficient of variation (CV, the standard error of an estimate as a percentage of the estimate—see Box 2-5) is calculated for each cell of the table. If the cell entry is zero, the CV is set to 100 percent. The CV values are arrayed from high to low, and if the median CV value—the value that divides the distribution into equal halves—is greater than 61 percent, then the full table cannot be released. The categories of the table are then combined into fewer categories, and the median CV for the new table is calculated anew and the test is reapplied. If the median CV is still greater than 61 percent, then even the simpler table cannot be released (see Box 4-4 for an example). It is difficult to evaluate this rule, but it could to lead to anomalous situations that make the data harder to use. For example, a table could be completely or partially suppressed one year and not the next year for the same geographic area, or a table could be suppressed for some, but not all, of the component areas of a large city or county. The suppression will affect small areas and minority population groups disproportionately.

OCR for page 139
Using the American Community Survey: Benefits and Challenges BOX 4-4 Illustrative Calculation for Suppressing Table Cells with Large Sampling Error, 1-Year ACS Period Estimates Assume a city of population 100,000, with 2,000 school-age children in a particular population group (e.g., Hispanic). First Pass of Table Ratio of Family Income to Poverty Threshold Percent Chldren in Category Coefficient of Variation (CV) Below poverty threshold 15.0 60.4 100–149 percent of poverty 10.0 76.1 150–199 percent of poverty 10.0 76.1 200–249 percent of poverty 10.0 76.1 250–299 percent of poverty 10.0 76.1 300–349 percent of poverty 20.0 50.7 350 percent or more of poverty 25.0 43.9 What is the result? Median CV is 76.1. The table may not be released because the median CV is greater than 61.0. Second Pass of Table after Combining Categories Ratio of Family Income to Poverty Threshold Percent Chldren in Category Coefficient of Variation (CV) Below poverty threshold 15.0 60.4 100–199 percent of poverty 20.0 50.7 200–299 percent of poverty 20.0 50.7 300–349 percent of poverty 20.0 50.7 350 percent or more of poverty 25.0 43.9 What is the result? Median CV is 50.7. The table may be released because the median CV is less than 61.0. Recommendation 4-10: The Census Bureau should monitor the extent of collapsing of cells that is performed in different tables to meet minimum precision standards of 1-year and 3-year period tabulations from the ACS and assess the implications for comparisons among geographic areas and over time. After sufficient information has been gleaned about the extent of data collapsing, the Census Bureau, in consultation with data users, should assess whether its collapsing rules are sound or should be modified for one or more subject areas.

OCR for page 139
Using the American Community Survey: Benefits and Challenges 4-D.3 Inflation Adjustments Chapter 3 discussed the procedures used by the Census Bureau to adjust income amounts for the 1-year, 3-year, and 5-year period estimates and housing value and cost amounts for the 3-year and 5-year period estimates to reflect changes in the national all-item consumer price index (CPI) over the period (see Section 3-A.2.c and Table 3-1). The discussion underlined the importance of users understanding the resulting estimates—for example, a 5-year period estimate of income or housing value is the average of all of the reported amounts over the 5 years expressed in dollars for the latest year using a national CPI adjustment. Moreover, as with any period estimate, the same inflation-adjusted average dollar amount for two areas may reflect different underlying patterns—for example, average income for 2005–2009 expressed in 2009 dollars of, say, $40,000 could result from income growth, stability, or decline over the 5-year period. For many applications, users may prefer the Census Bureau’s adjustment to latest-year dollars by using the national CPI to some other inflation adjustment or to no inflation adjustment at all. One advantage is that 1-year, 3-year, and 5-year period estimates for a large city or county will all be expressed in dollars for the same (latest) year—for example, 2009 dollars in the case of estimates for 2009, 2007–2009, and 2005–2009. For some applications, however, users might prefer an inflation adjustment that is specific by geographic area. The problem is that area price data are limited. Currently, the Bureau of Labor Statistics (BLS) collects price data for over 100 specific areas, but it publishes price indexes for only the four regions (Northeast, Midwest, South, and West), population size classes of metropolitan statistical areas (MSAs), and the 20 largest MSAs. No price data are collected for rural areas.10 Moreover, variation in price changes may be as great within areas for which price indexes are available as among them—for example, prices for housing and other goods may increase at a very different rate in the central city and suburbs, let alone individual neighborhoods, of an MSA. Finally, area-specific price indexes are less precise than the national all-item CPI. For still other applications, users may require latest-year estimates for income, housing costs, or housing value. Averages of reported amounts over 3 or 5 years adjusted for inflation to the latest year are not likely to be the same as latest-year amounts. For income, this is true even for the 1-year period estimates: inflation-adjusted averages of reported income over the 23 months covered in 1-year period estimates are not likely to be the same as latest-year income estimates. For estimating latest-year housing amounts from multiyear averages, 10 See www.bls.gov/cpi/cpi/faq.htm.

OCR for page 139
Using the American Community Survey: Benefits and Challenges the problem is a lack of subnational price indexes for specific items, such as housing value, rent, and different utilities or other housing costs. For income amounts, the problem is that incomes are not prices: income (in total and by component, such as wages or pension income) may increase faster (or slower) than inflation. A possibility to investigate in this context is to use estimation methods that are appropriate by type of income. For Social Security and Supplemental Security Income, it would be appropriate to use the applicable national CPI to which these payments are indexed by law. For property and self-employment income, it might be more appropriate to use an average interest rate, whereas, arguably, some types of income—specifically, public assistance and other retirement income—should not be inflated at all unless it is known that a jurisdiction has increased such payments. For wages, it could be possible to use changes at the county level in average quarterly wages for employees covered under state and federal unemployment insurance programs. These data, which are part of the BLS Quarterly Census of Employment and Wages, are released each quarter about 6–7 months after data collection. It might be possible to develop simpler models to estimate latest-year amounts by using the published multiyear estimates. For example, by examining how well the trends in BLS county wage data estimate 1-year period income from the 5-year period estimates for large counties, a user might be able to develop a procedure for estimating latest-year incomes from the 5-year period estimates for small counties. To determine how to produce the most helpful data on income, housing costs, and housing value, the Census Bureau should initiate a two-part discussion with users. The Census Bureau should first clearly illustrate to users the nature of the current inflation adjustment procedures. Then it should ascertain users’ needs for income and housing amount information, the resultant implications for what adjustment procedures can best serve most users, and what steps to take to assist users whose needs are not satisfied by the standard procedures. Finally, the Census Bureau should consider providing tables that reflect unadjusted dollar amounts whenever it provides adjusted amounts. So doing will make clearer to users the effects of inflation and enable them to determine if another kind of adjustment would better suit their purposes. Recommendation 4-11: The Census Bureau should provide users with a full explanation of its inflation adjustment procedures and their effects on multiyear ACS estimates of income, housing costs, and housing value. It should consult with users about other kinds of income and housing amount adjustments they may need and conduct research on appropriate estimation methods (for example, methods to produce latest-year amounts from multiyear averages). It should consider pub-

OCR for page 139
Using the American Community Survey: Benefits and Challenges lishing selected multiyear averages in nominal dollars as well as inflation-adjusted dollars. 4-D.4 Tabulation Specifications The long-standing release plan for tabulations from the ACS includes two major elements: (1) the universe or population covered and (2) the geographic areas for which tabulations are produced. The full universe for ACS data products, beginning in 2006, will include the housing unit and GQ populations, although some tables may be published for subuniverses, such as households or the noninstitutional population. (Prior to 2006, tabulations included just the housing unit population.) For geographic areas, the available products (1-year, 3-year, and 5-year period estimates) will depend on the population size of the geographic area (refer back to Tables 2-4 and 2-5). The Census Bureau will need to follow its plan for a number of years, not only to allow time for collection of sufficient data to begin release of 3-year period estimates in 2008 and 5-year period estimates in 2010, but also to allow both the Census Bureau and the data user community sufficient opportunity to gain experience with the various sets of tabulations. Yet the Census Bureau should not neglect to consult with users to determine if the population universe and the geographic area specifications are optimal or might be modified to produce more useful information. With regard to population coverage, the key question is the role of GQ residents, particularly those in institutions. The Census Bureau will need to consult with users regarding appropriate universe definitions for ACS tabulations—for example, employment and income tabulations may be most useful if they are restricted to the noninstitutional population. In 2000, confidentiality concerns sometimes precluded the publication of the same tabulations separately for households and GQ residents in very small areas. Because the ACS estimates for small areas are averages over multiyear periods, confidentiality concerns could be less of a problem in this regard. Ideally, consultation with users on the most useful tabulation universes would precede and feed into the production of tables for 2006 (for release in summer 2007), which will be the first year to include GQ residents. For the geographic area release schedule, one issue is the population size cutoff for publication of 1-year period estimates, for which the Census Bureau might consider the usefulness of lowering the current threshold of 65,000 residents to one of, say, 50,000 residents. The discussions in Chapters 2 and 3 emphasize the large sampling errors of 1-year period estimates for a small population group (such as school-age children in poverty) for geographic areas with fewer than 250,000 people, so lowering the threshold might appear to be deleterious. However, estimates for major population

OCR for page 139
Using the American Community Survey: Benefits and Challenges groups will often meet common standards of precision for areas of 50,000 population (see Table 2-8). Moreover, 50,000 is a common threshold for allocation of various types of federal assistance. Yet another advantage of lowering the threshold to provide 1-year period estimates for additional areas is that users would have more flexibility in combining the data and, consequently, would less often have to request special tabulations from the Census Bureau. For example, users could average two 1-year period estimates for a small city or county to obtain a 2-year period estimate that was more precise than the individual 1-year period estimates. A second issue for release of geographic area tabulations concerns the feasibility of producing 3-year period estimates for user-defined statistical subareas of large cities and counties. Such subareas could be a set of aggregations of census tracts or block groups in cities and of places and towns in counties, where the city or county has at least 40,000 people (so that, at a minimum, there are two subareas, each with at least 20,000 people). If the city or county is large enough to have more than one PUMA, then the subareas could usefully nest within a PUMA to maximize the ability to relate the data for the PUMA and its subareas. (PUMA boundaries may need to be redrawn in some areas to achieve the most useful designation of subareas within PUMAs.) Finally, it may be possible to produce 1-year period estimates for large statistical subareas of PUMAs, particularly if the threshold for 1-year period estimates is lowered to 50,000 people. The Census Bureau will need to explore with users the desirability of providing additional estimates for statistical subareas of large cities and counties and weigh user needs against the feasibility of increasing the production workload for the ACS. Recommendation 4-12: If some or all GQ residents continue to be included in the ACS, the Census Bureau should consult with users regarding the most useful population universe for tabulations, which, depending on the table, could be the entire population, the household and GQ populations separately, or the noninstitutional and institutional populations separately. Recommendation 4-13: The Census Bureau should consider expanding the geographic areas for ACS tabulations in order to afford users greater flexibility for aggregating small areas into larger user-defined areas. Two possibilities to investigate are to lower the population threshold for 1-year period estimates to, say, 50,000, and to produce 3-year (and possibly 1-year) period estimates for user-defined statistical subareas of large cities (aggregations of census tracts or block groups) and counties (aggregations of places and towns).

OCR for page 139
Using the American Community Survey: Benefits and Challenges 4-D.5 Data Quality Review The final step in the production and release of tabulations and other ACS data products is review by subject matter analysts to be sure there are no obvious errors or anomalies in the data. Each year the ACS processing staff and subject matter analysts must complete the entire process of preparing and reviewing data products within the span of a few months. In contrast, the preparation and review of data products from the long-form sample typically required well over a year to complete. The volume of estimates to be reviewed each year led the Census Bureau to develop automated tools to facilitate the work of the staff. One tool, ART II, was developed in 2005 as an improved version of a similar tool (ART) used in 2003–2004. This tool automates the process of identifying statistically significant differences in estimates from one year to the next and facilitates other aspects of the review process. Other tools enable analysts and managers to track the process of review for tabulations and PUMS (U.S. Census Bureau, 2006:13-11). We support continued efforts by the Census Bureau to automate and standardize the review process for ACS products, including not only the final review, but also review at earlier stages, such as when imputations for missing data and weighting adjustments are applied to the data records. As the time approaches when 1-year, 3-year, and 5-year period estimates must be provided for thousands of geographic areas every year (including 5-year estimates for over 200,000 individual block groups), the immensity of the review task threatens to overwhelm the analyst staff. They will run the risk of inadvertently releasing poor-quality data unless they receive a high level of technical assistance. The Census Bureau recently identified prerelease review of demographic data, including from the ACS and other household surveys, as an important problem that merits research (Bell, 2006:10). The panel urges the Census Bureau to not only continue, but also to step up its investment of resources for automated tools, standardized protocols, and other means to facilitate an appropriate level of review of ACS data products that will ensure a high standard of quality before they are released each year. Consulting with computer software development firms and with computer scientists in academia may generate useful ideas and identify existing automated tools that are relevant to the Census Bureau’s needs (see National Research Council, 2003b). Recommendation 4-14: The Census Bureau should increase its research and development on automated tools and standardized procedures to facilitate timely review and quality control of the large volume of ACS data products.