Cover Image

PAPERBACK
$44.00



View/Hide Left Panel

5

Sample Allocation and Selection

The goal of the American Community Survey (ACS) is to provide estimates of detailed characteristics of the total population of the United States at levels of geography as small as census block groups, replacing data that were previously obtained through the census long form. The current ACS sample design is optimized to produce substate-level estimates of characteristics of the household population. However, for the group quarters (GQ) population, the design accommodates only state-level estimates of the overall GQ population. The sample design is not adequate for substate GQ estimates, and this also affects the usability of the total population estimates for smaller geographic areas.

As discussed in Chapter 2, the GQ sample consists of two strata, small and large. The small stratum includes facilities with 15 or fewer residents, as shown on the frame, and ones for which the expected number of residents is unknown because the facility was closed on Census Day or because it was recently added to the sampling frame without information about the expected population count. The large stratum includes group quarters with expected populations of more than 15 residents.

The approach to sampling the small stratum is similar to the household sampling method (U.S. Census Bureau, 2009). First, each small facility is randomly assigned to one of five subframes, and sample is selected from each subframe once every 5 years. The facilities in a state are then sorted by small versus closed on Census Day, GQ type, and geographical order, and a systematic sample is selected. In most states, the overall facility sampling rate is approximately 1 in 40, or 2.5 percent in a given year. Some of the less populated states have higher target sampling rates to boost the precision of the state-level estimates.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 63
5 Sample Allocation and Selection The goal of the American Community Survey (ACS) is to provide estimates of detailed characteristics of the total population of the United States at levels of geography as small as census block groups, replacing data that were previ - ously obtained through the census long form. The current ACS sample design is optimized to produce substate-level estimates of characteristics of the house - hold population. However, for the group quarters (GQ) population, the design accommodates only state-level estimates of the overall GQ population. The sample design is not adequate for substate GQ estimates, and this also affects the usability of the total population estimates for smaller geographic areas. As discussed in Chapter 2, the GQ sample consists of two strata, small and large. The small stratum includes facilities with 15 or fewer residents, as shown on the frame, and ones for which the expected number of residents is unknown because the facility was closed on Census Day or because it was recently added to the sampling frame without information about the expected population count. The large stratum includes group quarters with expected populations of more than 15 residents. The approach to sampling the small stratum is similar to the household sampling method (U.S. Census Bureau, 2009). First, each small facility is ran - domly assigned to one of five subframes, and sample is selected from each sub - frame once every 5 years. The facilities in a state are then sorted by small versus closed on Census Day, GQ type, and geographical order, and a systematic sam - ple is selected. In most states, the overall facility sampling rate is approximately 1 in 40, or 2.5 percent in a given year. Some of the less populated states have higher target sampling rates to boost the precision of the state-level estimates. 63

OCR for page 63
64 SMALL POPULATIONS, LARGE EFFECTS For example, the target sampling rate is 7.11 percent for Wyoming and 4.95 percent for Vermont (see Table 5-1). All residents of the selected small facilities are eligible to be interviewed, unless the actual number of residents exceeds 15. In such instances, a subsample of 10 residents is selected when the field representative visits the facility (a process similar to that for the large facilities). TABLE 5-1 GQ Facility Annual Sampling Rates and Sample Sizes by State, 2009 Facility Sampling Rate Facility State (percentage) Sample Size Alabama 2.50 2,688 Alaska 5.53 1,033 Arizona 2.50 3,024 Arkansas 2.50 1,861 California 2.50 20,205 Colorado 2.50 2,591 Connecticut 2.50 2,805 Delaware 4.86 1,036 District of Columbia 3.08 1,019 Florida 2.50 9,649 Georgia 2.50 5,626 Hawaii 3.24 983 Idaho 3.48 945 Illinois 2.50 7,432 Indiana 2.50 4,439 Iowa 2.50 2,468 Kansas 2.50 2,002 Kentucky 2.50 3,049 Louisiana 2.50 3,400 Maine 3.14 1,159 Maryland 2.50 3,376 Massachusetts 2.50 5,494 Michigan 2.50 6,796 Minnesota 2.50 3,591 Mississippi 2.50 2,309 Missouri 2.50 4,093 Montana 4.38 1,100 Nebraska 2.50 1,334 Nevada 3.36 1,148 New Hampshire 3.17 1,176 New Jersey 2.50 5,184 New Mexico 3.06 1,033 New York 2.50 13,948 North Carolina 2.50 6,363 North Dakota 4.59 1,103 Ohio 2.50 7,678

OCR for page 63
65 SAMPLE ALLOCATION AND SELECTION TABLE 5-1 Continued Facility Sampling Rate Facility State (percentage) Sample Size Oklahoma 2.50 2,687 Oregon 2.50 2,193 Pennsylvania 2.50 11,073 Rhode Island 2.75 1,095 South Carolina 2.50 3,488 South Dakota 3.91 1,128 Tennessee 2.50 3,615 Texas 2.50 13,001 Utah 2.79 1,061 Vermont 4.95 1,054 Virginia 2.50 5,853 Washington 2.50 3,287 West Virginia 2.50 1,174 Wisconsin 2.50 3,958 Wyoming 7.11 1,001 Puerto Rico 2.50 960 SOURCE: Based on tabulations provided by the Census Bureau on July 30, 2010. The sampling units for the large facilities are clusters of GQ residents, who are selected in groups of 10. This means that a large GQ facility is indirectly sampled with probability proportional to size (PPS) measured by its number of anticipated groups of 10 residents. Larger facilities can have several groups of 10 residents represented in the sample. Specifically, groups of residents of large group quarters are sorted by GQ type and geographical order and the groups of residents are then systematically sampled at a rate of approximately 1 in 40 (again, with some exceptions). This means that only group quarters with 40 or more groups of 10 residents are guaranteed to have at least 1 group of residents represented in a particular sample. As described above, the list of residents eligible to be interviewed is determined during the field representative’s visit to the facility. During the visit, an algorithm with a random start is applied to the actual roster of residents. If multiple groups of 10 are selected, the groups are assigned to be interviewed during different months (with some exceptions, in the case of GQ types in which the data collection is concentrated in a shorter period of time for logistical reasons). When the ACS sample design was first developed, the sampling rate for group quarters was 3 percent of addresses annually, translating into 15 percent over 5 years, but budget constraints resulted in lower sampling rates over the years. This means that a careful look at the sample design is warranted to iden - tify possible opportunities for increased efficiency.

OCR for page 63
66 SMALL POPULATIONS, LARGE EFFECTS STATE-LEVEL ALLOCATION As discussed above, the sample size of small group quarters in a state is proportional to the number of small group quarters on the frame for that state. The sample size of large group quarter residents is proportional to the expected number of residents in large group quarters in the state. Because the GQ sample is not currently controlled at substate geogra - phies, substate estimates of the combined household and GQ population may be highly variable, a problem that is discussed in more detail in the next section. To address this, the sample design could be modified to better man - age (control) the sample allocation rates at the substate level and over time. For 3- and 5-year estimates, the sample could be required to have a minimum number of group quarters in each county over the course of the 5-year period. Some states have a large number of small counties, a situation that represents a challenge, but this change could improve the quality of the data available for small areas. Another approach would be to individualize the sample further, depend - ing on the characteristics of the small jurisdictions. For example, the lack of control over the allocation rates for smaller geographies may have a large effect on the estimates produced for a community that has 1,000 persons living in households and a correctional facility with 100 residents. According to counts from the 2000 census, places that have 10 percent or more of their popula - tion residing in group quarters represent less than 5 percent of all places in the United States. These may be the cases that would need individualized attention. Additional control over the allocation to substate areas may be facilitated by switching from a PPS design for large group quarters to one in which strata are created on the basis of size and substate area and an equal probability sample is selected within strata. This would permit the allocation to substate areas to be better controlled over time. This type of design would also simplify variance estimation, which appears to be a problem with the current design (Keathley, Navarro, and Asiala, 2010). To determine whether any efficiency would be lost by such a design, the Census Bureau could undertake a study of the effectiveness of the current PPS methods. The expected population numbers in the frame are often incorrect, which reduces the efficiency of PPS sampling. Consequently, the loss in precision from moving from PPS to strati - fied, equal probability sampling within strata may not be serious. Recommendation 5-1: The Census Bureau should conduct a formal evalu- ation of sample redesign strategies that would make it possible to control the American Community Survey group quarters sample allocation at the substate level. The evaluation should focus on identifying options that can improve the precision of the estimates at the state and substate levels, without substantially increasing the costs of the data collection.

OCR for page 63
67 SAMPLE ALLOCATION AND SELECTION As discussed, there are concerns that the sampling frame is outdated for many of the GQ types, and this includes the number of expected residents in a GQ facility—information that is used in the PPS sample selection. Table 5-2 shows the differences between the observed GQ population based on the 2008 data collection and the GQ population numbers expected from information on the sampling frame, by survey month. Discussions with the Census Bureau indicated that there are variations in the quality of the sampling frame by GQ type. Seasonality could also play a role in the discrepancies in the case of some GQ types. The Census Bureau has been researching the discrepancies between the expected and actual GQ sizes, and this research should continue to better understand the causes of the discrepancies, how they differ by GQ type, and how they affect the PPS sample design. Recommendation 5-2: The Census Bureau should monitor the accuracy of the measures of size used in the probability proportional to size group quarters (GQ) sample design in the American Community Survey and should assess the resources allocated for updating the GQ sampling frame in the context of how the measures-of-size information available from the sampling frame affects the effectiveness of the sample design. TABLE 5-2 Differences in 2008 Expected Population and Observed Population Percentage Difference Difference in in Sum of Sum of Sum of Sum of Expected Expected and Expected Observed and Observed Observed Month Population Population Population Population January 287,797 270,204 17,593 6.1 February 288,046 278,728 9,318 3.2 March 270,439 252,692 17,747 6.6 April 301,963 287,009 14,954 5.0 May 294,135 265,257 28,878 10.0 June 275,424 235,003 40,421 14.7 July 298,281 248,065 50,216 16.8 August 288,238 250,172 38,066 13.2 September 438,497 266,501 171,996 39.2 October 287,672 276,236 11,436 4.0 November 279,735 264,367 15,368 5.5 December 301,329 282,236 19,093 6.0 Total 3,611,556 3,176,470 435,086 12.0 SOURCE: U.S. Census Bureau (2010).

OCR for page 63
68 SMALL POPULATIONS, LARGE EFFECTS Given that some GQ facilities can be very large relative to the size of the household population in a geographic area, capturing them in the sample with certainty may be important. In some respects, data collection from GQ facili - ties resembles surveys of business populations, which often include a stratum of “must-take” units in the sample. For example, statistical strategies developed for business surveys—including methods to identify units to be in the sample with certainty—may be useful to consider for the ACS. Clearly, if small-area estimates are important, then, in principle, a sampling design more suitable to address that need would be ideal. However, a must- take approach is often justified when, for example, local experts can identify (domain) estimates that are not reasonable, or there are reasonably foreseeable uses that the survey design either did not or could not account for. Typically, the development of a must-take stratum is guided by data use considerations, including the data needs of subject-matter specialists who can provide input on whether specific locally significant units must be included in the sample. However, it will be important to carefully monitor the impact of the inclusion of the must-take units on the sample design to ensure that a high number of must-take cases does not excessively distort the design from a more optimal use of resources. Recommendation 5-3: The Census Bureau should assess whether useful strategies could be learned from other surveys that incorporate a must-take stratum of large units in the sample design and evaluate these strategies for possible use in the sample design for group quarters in the American Community Survey. SUBSAMPLING WITHIN LARGE GQ FACILITIES The residents of large group quarters are subsampled in groups of 10, and some group quarters can have multiple groups of 10 in the sample. Given that group quarters provide housing and services to people with similar needs and circumstances, the intraclass correlations within group quarters are naturally high for many variables. Thus, while cost-effective, subsampling a large number of residents in a facility may be statistically inefficient. Reducing the number of persons subsampled in a facility and increasing the number of sample group quarters could improve the reliability of the estimates. This would also mean increased field costs if the number of sampled group quarters has to be increased to achieve the same level of precision of estimates. However, it is also possible that the subsample sizes could be reduced without a substantial loss in precision. If so, there may be no need to increase the number of sampled group quarters. The ideal balance between data quality considerations and cost would have to be evaluated to determine the optimal subsample size. A recent Census Bureau project calculated the optimal subsample size

OCR for page 63
69 SAMPLE ALLOCATION AND SELECTION for residents of GQ facilities to be around four after averaging the results of calculations based on two different sets of assumptions about travel costs (Sommers and Hefter, 2010). The question can be approached in a variety of ways, particularly in terms of calculating cost savings. This is one reason why pursuing this research further is important. Future research could also take into consideration possible differences among the intraclass correlations that characterize different GQ types, given that the correlations are presumably not equally high among all of them. Recommendation 5-4: The Census Bureau should expand on the research it initiated to determine the optimal cluster size for subsampling residents in large group quarters (GQ) in the American Community Survey, estimat - ing intraclass correlations for different variables, and factoring in facility- level and person-level costs using a variety of approaches. The analysis should address whether the same subsample size is efficient for each GQ type and whether the size of the subsample per facility should be reduced.

OCR for page 63