Sample Allocation and Selection
The goal of the American Community Survey (ACS) is to provide estimates of detailed characteristics of the total population of the United States at levels of geography as small as census block groups, replacing data that were previously obtained through the census long form. The current ACS sample design is optimized to produce substate-level estimates of characteristics of the household population. However, for the group quarters (GQ) population, the design accommodates only state-level estimates of the overall GQ population. The sample design is not adequate for substate GQ estimates, and this also affects the usability of the total population estimates for smaller geographic areas.
The GQ sample consists of two strata, small and large. As discussed in Chapter 2, the small stratum includes facilities with 15 or fewer residents, as shown on the frame, and ones for which the expected number of residents is unknown because the facility was closed on Census Day or because it was recently added to the sampling frame without information about the expected population count. The large stratum includes group quarters with expected populations of more than 15 residents.
The approach to sampling the small stratum is similar to the household sampling method (U.S. Census Bureau 2009). First, each small facility is randomly assigned to a subframe associated with one of five data collection years. The facilities in a state are then sorted by small versus closed on Census Day, GQ type, and geographical order (county, tract, block, street name, and GQ identifier), and a systematic sample is selected. In most states, the systematic sample selects 1 in 8 group quarters, resulting in an overall facility sampling rate of 1 in 40, or 2.5 percent in a given year. Some of the less populated states have higher target sampling rates to boost the precision of the estimates. For example, the target sampling rate is 7.11 percent for Wyoming and 4.95 percent for Vermont. All residents of the selected small facilities are eligible to be interviewed, except if the actual number of residents exceeds 15. In such instances, a subsample of 10 residents is selected when the field representative visits the facility (a process similar to that for the large facilities).
The sampling units for the large facilities are the GQ residents, who are selected in groups of 10. This means that a GQ facility is indirectly sampled with probability proportional to its number of anticipated groups of 10 residents. Larger facilities can have several groups of 10 represented in the sample. Specifically, large GQ groups are sorted by GQ type and geographical order and the groups are then systematically sampled at a 1 in 40 rate (again, with some exceptions). This means that only group quarters with 40 or more groups of 10 are guaranteed to have at least 1 group represented in a particular sample. As described above, the list of residents eligible to be interviewed is determined during the field representative’s visit to the facility. During the visit, an algorithm with a random start is applied to the actual roster of residents. If
multiple groups of 10 are selected, the groups are assigned to be interviewed during different months (with some exceptions, which are discussed in subsequent sections).
When the ACS sample design was first developed, the sampling rate was 3 percent of addresses annually, translating into 15 percent over 5 years. Due to budget constraints, the current annual sampling rate is around 2.2 percent, resulting in decreased reliability of the estimates. For fiscal year 2011, the Census Bureau has requested a budget increase that would allow for an increase in the sampling rate. However, because the primary concern is the reliability of the household estimates, a possibly increased budget is not expected to address challenges related to the reliability of the GQ estimates. This means that a careful look at the sample design is warranted to identify possible opportunities for increased efficiency.
As discussed above, the sample of small group quarters in a state is proportional to the number of small group quarters on the frame for that state. The sample of large group quarters is proportional to the expected number of residents in large group quarters in the state.
Because the GQ sample is not currently controlled at substate geographies, substate estimates may be highly variable, a problem that is discussed in more detail in the next section. To address this, the sampling design could be modified to exercise more control over the allocation rates at the substate level and over time. For 3-year and 5-year estimates, the sample could be required to have a minimum number of group quarters in each county over the course of the 5-year period.
Another approach would be to individualize the sample further, depending on the characteristics of small jurisdictions. For example, the lack of control over the allocation rates for smaller geographies may have a large effect on the estimates produced for a community that has 1,000 households and a correctional facility with 100 residents. According to counts from the 2000 census, places that have 10 percent or more of their population residing in group quarters represent less than 5 percent of all places in the United States. These may be the cases that would need individualized attention.
Additional control over the allocation to substate areas may be facilitated by switching from a probability proportional to size (pps) design for large group quarters to one in which strata are created on the basis of size and substate area and an equal probability sample selected within strata. This would permit the allocation to substate areas to be better controlled over time. This type of design would also simplify variance estimation, which appears to be a problem with the current design (Keathley, Navarro, and Asiala, 2010). To determine whether any efficiency would be lost by such a design, the Census Bureau could undertake a study of the effectiveness of the current pps methods. The expected population numbers in the frame are often incorrect, which reduces the efficiency of pps sampling. Consequently, the loss in precision from moving from pps to stratified, equal probability sampling may not be serious.
Recommendation 3-1: The Census Bureau should investigate the implications of controlling the ACS group quarters sample allocation at the substate level and over time to better understand how these changes would impact the precision of the estimates and the costs of the data collection at the state and substate levels.
Recommendation 3-2: The accuracy of the measures of size used in the probability proportional to size ACS group quarters sample design should be studied. If the measures of size are seriously out of date, methods should be considered for updating the frame, as suggested in Recommendations 2-1 through 2-4.
SUBSAMPLING WITHIN GQ FACILITIES
The residents of large group quarters are subsampled in groups of 10, and some group quarters can have multiple groups of 10 in the sample. Given that group quarters provide housing and services to people with similar needs and circumstances, the intraclass correlations within group quarters are naturally high for many variables. Thus, while cost-effective, subsampling a large number of residents in a facility may be statistically inefficient. Reducing the number of persons subsampled in a facility and increasing the number of sample group quarters could improve the reliability of the estimates. This would also mean increased field costs if the number of sampled group quarters has to be increased to achieve the same level of precision of estimates. However, it is also possible that the subsample sizes could be reduced without a substantial loss in precision. If so, there may be no need to increase the number of sampled group quarters. The balance between cost and variance would have to be evaluated to determine the optimal subsample size.
A recent Census Bureau project calculated the optimal subsample size to be around four after averaging the results of calculations based on two different sets of assumptions about travel costs (Sommers and Hefter, 2010). The question can be approached in a variety of ways, particularly in terms of calculating cost savings. This is one reason why pursuing this research further is important. Future research could also take into consideration possible differences among the intraclass correlations that characterize different GQ types, given that the correlations are presumably not equally high among all of them.
Recommendation 3-3: Research on the optimal cluster size for subsampling residents in large group quarters should continue, estimating intraclass correlations for different variables and factoring in facility-level and person-level costs using a variety of approaches. The analysis should address whether the same subsample size is efficient for each GQ type and whether the size of the subsample per facility should be reduced.