Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
6 Using ACS for the NSCG Sample Frame A s discussed above, the idea of using the American Community Survey (ACS) as a sample frame for the National Survey of Col- lege Graduates (NSCG) was born of necessity. In this chapter we discuss the use of the ACS for drawing and maintaining the NSCG sample. We begin with the requirements and constraints that drive our consideration of alternative sample designs as they relate to the ACS as the sampling frame. We then discuss a number of design issues that are important in setting criteria for our recommendations. Next, we identify and discuss the various sample design features and approaches that have been discussed during the panelâs deliberations. We offer our recom- mended design and close the important issue of the transition from the current design to the new one. REQUIREMENTS AND CONSTRAINTS Requirements In adopting the ACS as the NSCG sampling frame, some aspects of the NSCG sample design, such as weighting of women, minorities, and other population groups, may not change much, but other aspects may be markedly affected by the ACS design. As noted above, the ACS is a con- tinuous monthly sample in contrast to the long form, which used point-in- time sampling on Census Day (April 1). Consequently, the ACS reference period for questions on education and occupation rotates throughout the 50
USING ACS FOR THE NSCG SAMPLE FRAME 51 year. Although the use of a question on field of bachelorâs degree in the ACS sample design will be highly beneficial for targeting potential sample members, it will also pose complexities for integration of the new vari- able with other previously used variables in the design. The continuous nature of the ACS also raises questions about the frequency with which the NSCG should be conducted and the sample refreshed, either for the entire college graduate population or for subgroups, such as immigrants and other new populations or those with low response rates. The use of the ACS as a sampling frame for the NSCG and other National Science Foundation (NSF) surveys raises several technical issues. The continuous nature of the ACS poses opportunities for frequent updat- ing of the NSCG sample frame, while the limited size of 1 yearâs ACS sample (relative to the long-form sample) requires accumulation over several survey rounds to provide a frame of suitable size for oversam- pling rare populations, such as minority college graduates by field of science and engineering (S&E) degree. These issues must be addressed and considered in the development of an implementation plan to begin in fiscal year 2009. The conversion to the ACS opens the possibility of reconsidering the target population for the survey. The fact that the questions on the ACS are much like the questions on the census long form mitigates against major changes, but the addition of the field-of-degree question (in either format) permits a rethinking of the target population. The current surveys in the Scientists and Engineers Statistical Data System (SESTAT) cover U.S. residents with bachelorâs and higher degrees in science and engineer- ing, including: â¢ recent (past 2 academic years) U.S.-earned-S&E-degree recipients, a population that is currently identified in the National Survey of Recent College Graduates (NSRCG) and the Survey of Earned Doctorates (SED); â¢ not-recent U.S.-earned-S&E-degree recipients (those tracked in the NSCG); â¢ U.S. residents without S&E degrees who work in S&E occupa- tions (also tracked in the NSCG); and â¢ new immigrants to the United States with all S&E bachelorâs and higher degrees earned outside the United States (currently obtained only through the initial postcensal NSCG). In SESTAT, there is special attention on minority populations with sepa- rate estimation capability by race and ethnicity, gender, disability status, and U.S. or foreign citizenship.
52 USING THE AMERICAN COMMUNITY SURVEY Converting to an ACS-based sampling frame also provides the oppor- tunity to rethink the NSCG sample size as it relates to the targeted preci- sion for population subgroups of interest. The 2003 postcensal NSCG sampled more than 170,000 cases, plus about 40,000 respondents who were carried forward into the 2003 sample from the 1999 NSCG or the 2001 NSRCG. The 1999 NSCG cases were surveyed for methodological reasons and were not included in SESTAT. Because the census long form did not include a field-of-degree question, the remaining 2003 NSCG sample had to be sufficiently large to derive the required sample size of scientists and engineers needed to achieve the targeted precision for esti- mates of characteristics of interest. For the NSCG, the targeted precision levels were expressed in terms of generalized variance parameters for the different degree fields and population subgroups. With the inclusion of a field-of-degree question in the ASC, the screening sample size requirements can be reduced although it would be expected to be more than the 68,000 cases used for the 2006 NSCG, which followed those 2003 cases derived from the census long- form portion of the NSCG together with a subsample of recent graduates from the 2001 and 2003 NSRCG surveys. Constraints (1) Sample Size In a presentation to the panel at its October 2007 workshop, Stephen H. Cohen of NSF identified some possible drawbacks to using the ACS as a sample frame associated with the need to accumulate a sufficiently large sample to meet specific objectives. The ACS sample over one annual cycle does not capture enough of rare populations for NSCG needs. Although most cells have adequate population counts after two cycles of the ACS, some rare populations would require up to five ACS cycles to produce a sample equivalent to the 2003 postcensal sample. The ACS surveys 250,000 addresses a month. Thus, most uses of the ACS for the NSCG sampling frame will require aggregating 1 or 2 years of the ACS. The largest sample that could be needed from the ACS would occur if the entire NSCG sample is replenished in a single draw (see sec- tion on options, below). In 1993 and 2003, the requisite sample sizes were 215,000 and 171,000, respectively. However, much smaller sample sizes are required at any one time for design options that move away from a large once-a-decade sample (see below). The ACS annual sample selection includes approximately 3 mil-
USING ACS FOR THE NSCG SAMPLE FRAME 53 lion housing units and 7.8 million people. In 2005, after mail responses, computer-assisted telephone interviews (CATI) and a subsample col- lected by use of computer-assisted personal interviews (CAPI), the ACS had a completion rate of 66 percent (National Science Foundation, 2007, p. 15). This completion rate reflects the design of the ACS. By design, only one-third of the nonmail/CATI respondents are followed up with CAPI. Although the completion rate is important for sampling purposes, it should not be confused with the response rate. The weighted response rate for the 2005 ACS (weighted for the CAPI subsampling) was 95 percent. Assuming a similar rate in the future, the ACS would yield data for some 2 million housing units and about 5.2 million people annually. Approximately 19 percent will fit the SESTAT population definition (i.e., have a bachelorâs degree or higher and be aged 75 years or under). Thus, less than 1 million cases (about 978,640) would be eligible for the NSCG: In comparison, 6.4 million cases were eligible from the 2000 census long- form sample for the 2003 NSCG. On the basis of an analysis of the full-year 2005 ACS data, NSF has determined that one year of ACS samples (January to December) may contain enough cases to equal or surpass the size of past NSCG postcensal samples for some populations, but it is unlikely to have enough sample to equal the previous NSCG cell size for the more rare populations (such as minority groups). At least 2 years of monthly samples are necessary to provide sufficient coverage of many of these small population groups. Because the Census Bureau processes the ACS monthly samples on a c Â alendar-year basis (12 months of sample are processed together after data collection has closed), NSCG samples may require 2 (or more) years of ACS data if a completely new sample is drawn. If NSF phases in the use of the ACS (e.g., by continuing to use some of the current 2000 decennial sample until the ACS provides sufficient sample for NSCG sampling), it may be possible to initially use 1 year of ACS sample. â This number is based on an average household size of 2.6; average household size was determined from the Census Bureauâs American Fact Finder with data from the ACS for 2005. â is unlikely that 12 months of ACS data would be sufficient for approximately one-third It of the aggregate sampling cells that NSF has tested. These aggregate cells combined minor- ity groups and used fewer occupational categories than have been used in the past. Using the current sampling cells, several more years of ACS samples may be required to produce sample sizes similar to those achieved with the 2003 NSCG design. The aggregate cells that NSF tested are important because they form the basis for many of the domains for which estimates have been produced in the past. It is possible that they can be achieved with 2 years of ACS samples.
54 USING THE AMERICAN COMMUNITY SURVEY (2) Timing There are several issues with respect to timing, all of which require some new flexibility with the design of the NSCG. One is the timing of the NSCG reference period. In the past, NSF determined that the NSCG reference period must be consistent with those of the other two SESTAT surveys. Throughout the 1990s, the reference date for all SESTAT surveys was April 15. However, for the 2003 surveys, the reference date was changed to October 1. The change was made to improve population cov- erage and the precision of the estimates, to improve locating operations, and to provide sufficient time for enhancing survey operations. The first reason is most important: By moving to an October date, the NSRCG and SDR survey operations were enhanced by allowing a sample of the new S&E and health graduates after the respective frames were finalized. With an April reference date, sampling must occur early in the year, when the frame information from input sources for the most recent graduates is not yet final. Therefore, there is missing, incomplete or out-of-date information with which to sample. By moving the reference date to a later time in the year, these issues are resolved, resulting in more accurate sampling and estimation. One of the principal goals of the SESTAT program is to provide accu- rate employment data on scientists and engineers in the United States. By collecting the data with an April reference date, employment data may be misleading for recent graduates, who may still be in transition to employ- ment from their most recent enrollment. Pushing the reference date to later in the year may result in capturing a more stable employment profile for these individuals because data will be collected from them after some have completed temporary or summertime employment transitions. The 2003 NSCG data (which used an October reference date) does show such an effect; there was a lower unemployment rate compared with an April reference date, and there were fewer individuals in temporary employ- ment positions, such as postdoctoral positions. This result is similar to trends that were observed in previous decades of the surveys when the data were collected in the fall. The schedule for processing ACS data also has implications for the reference date for the NSCG and thus for the other two SESTAT surveys. A full calendar year (or years) of ACS data needs to be available sufficiently in advance of the NSCG reference date to allow the Census Bureau time to clean and weight the ACS data as well as to allow for sufficient time to select and prepare the NSCG sample for the field. To have ACS frame data that are as âfreshâ as possible at the time the NSCG goes into the field, the ACS collection year would need to end about 8 to 10 months prior to the NSCG survey reference date. A fall NSCG reference date would accom- plish this, and the reference date for the 2008 and 2010 SESTAT surveys
USING ACS FOR THE NSCG SAMPLE FRAME 55 is currently planned as October 1. According to the Census Bureau, the 12-month calendar year ACS data would be ready for use in sampling some time before the end of June of the following year. The October SESTAT reference date allows several months to process the files, stratify the frame, select the sample, and create the mailing records. Such a time schedule has advantages in terms of the age of the data. Typically, there has been about a 3-year lag time between the reference date in the decennial long form and the NSCG postcensal survey. With an October reference date and a sample based on ACS monthly samples for the previous calendar year, some contact data would be less than 12 months old and none would be older than 22 months. (If 2 years of the ACS sample are used, only the oldest data would be similar in age to the long-form data.) Some sample cases will move between the time they were surveyed in the ACS and the NSCG data collection, but many fewer than in the postcensal surveys. Pooling the monthly ACS samples potentially creates some issues in estimation and determination of NSCG and SESTAT eligibility (e.g., determining whether or not an individual holds a bachelorâs degree). In the past, postcensal NSCG eligibility was based on a sample with a single reference date of the decennial census. In the ACS, each monthly sample has a different reference date; moreover, degree data are reported as of the interview date. This difference will require the NSCG to use a different strategy for determining eligibility. For example, degrees are conferred at many points during a year. For those receiving a bachelorâs degree during a particular ACS calendar year, NSCG eligibility could depend on which month they were interviewed for the ACS sample. That is, the ACS could be administered before or after degree receipt. The target population could be defined as those who earn a bachelorâs degree before the first month of the pooled ACS samples comprising the NSCG sampling frame. People recorded as having a bachelorâs degree but who turn out to have earned that degree after the beginning of that ACS year would be found during the NSCG interview and removed from the NSCG sample. Using such a procedure would result in a very small proportion of sample members being âscreened outâ as ineligible during â For NSCG sampling purposes, it would be desirable to have the ACS sampling data b Â efore June of the year following the reference year. However, if an option that requires more than 1 year of sampling data is selected, the timing of receipt of the data can be relaxed. NSF has developed scenarios based on 2 years of ACS sample units, suggesting the possibility of sampling and fielding the NSCG in two wavesâone based on the first of the 2 ACS years, which could be processed much in advance of the survey date, and the second, fielded slightly later, based on the second ACS year. In 2006, both the NSRCG and Survey of Doctor- ate Recipients (SDR) were fielded in two waves for similar reasonsâthe late availability of the frame for part of the sample; see National Science Foundation (2007).
56 USING THE AMERICAN COMMUNITY SURVEY the NSCG. Using ACS data from a calendar year and a cutoff month of the preceding December, only a small number of sample cases would have received their first bachelorâs degree after December but before the ACS sample cutoff month. A similar approach might be considered for immigrants, for which the target population could be defined as those immigrating to the United States as of a specified cutoff date. Recommendation 6.1: The National Science Foundation should stip- ulate that the target population of people with bachelorâs degrees be defined as of the beginning of the American Community Survey year. (3) Cost Being able to draw a sample and field the NSCG closer to the time the frame data were collected could contribute to reducing costs in several ways. A shorter time period between the frame and NSCG data collec- tion reduces the likelihood of changes in eligibility status between the two dates, such as moving abroad or earning another degree, and should improve the ability to locate individuals for participation. With a shorter time gap for all or most of the sample between the ACS frame data and the NCSG reference date, a smaller fraction of the NSCG sample cases will have moved from where they were living at the time of the ACS in comparison with the long form frame. Additionally, it may be easier to locate individuals who have moved within the United States when the time they have been gone from their previous addresses is shorter. Such factors may reduce the cost of locating, which would cut survey cost and possibly reduce time in the field. Cost savings could also be expected by an improvement in the ability to identify people who have changed status during the decade. The NSCG historically has provided the âstockâ of scientists and engineers near the beginning of the decade, while the NSRCG and SDR have captured the new flows of those receiving S&E degrees during the decade after the postcensal NSCG. To keep the frames for the three surveys mutu- ally exclusive and to eliminate the possibility of double counting these populations, all NSCG and NSRCG cases that involve people who earned another eligible degree after they were originally sampled in one of the surveys are considered out-of-scope cases for the integrated SESTAT data- set. Reducing the number of such sample cases that are excluded from â A person who was sampled in the NSCG (or the NSRCG) but subsequently earned an- other degree (bachelorâs, masterâs, or doctoral) in a science, engineering, and health field is eligible for inclusion in the NSRCG or SDR by virtue of that additional degree.
USING ACS FOR THE NSCG SAMPLE FRAME 57 the integrated database will increase the effective sample size and thus improve statistical precision. While not necessarily a cost-saving measure, a design that would result in taking several samples from the ACS over the decade would smooth over the present âpeaks and valleysâ spending pattern associ- ated with the present long-form-based design. NSF now has to obtain a large increase in resources just after the decennial census to cover sample design costs and the cost of the large screening sample needed to identify the S&E population. THE ACS AS A SAMPLE FRAME The NSCG has evolved over the years into a two-tiered program: a baseline postcensal NSCG followed by subsequent panel follow-up sur- veys. As described in Chapter 2, the NSCG surveys are complemented by other SESTAT surveys that provide some of the data on new flows of U.S.-educated scientists and engineers to the overall population, includ- ing new bachelorâs and masterâs science, engineering, and health gradu- ates from the NSRCG, and new doctorates in these fields from the SED. This practice of a baseline postcensal NSCG with subsequent follow-up surveys has been used for the NSCG for a variety of reasons. First and foremost, identifying and then locating the stock of scientists and engineers of interest is both difficult and expensive. Having identified them once through the initial baseline NSCG, it is cost-efficient to keep them in the NSCG throughout the decade rather than trying to identify others. Additionally, the use of follow-up surveys provided some stabil- ity to the estimates being made. The only alternative to maintaining the NSCG postcensal sample for use throughout the decade was to draw a brand new sample every 2-3 years, but additional screening surveys with large samples would have been very expensive and there would be no improvement in the coverage of the population because the sample frame (the decennial long form) remained the same. Freshly selected samples would not suffer from attrition losses due to panel fatigue, but they would suffer from greater levels of nonresponse due to addresses that become progressively out of date. The ACS as a sample frame is an attractive replacement for the cen- sus long-form-based sample frame. Its records share with the long-form records the ability to be stratified by households or people with specific characteristics. Thus, the ACS can provide an efficient frame for follow-on surveys. The ACS provides a means to include in the NSCG frame those immigrant scientists and engineers who earn all their degrees abroad and then come to the United States and enter the labor force. Similarly, it provides improved coverage throughout the decade of non-S&E gradu-
58 USING THE AMERICAN COMMUNITY SURVEY ates working in S&E or S&E-related occupations, a shortcoming of the present long-form-based sample frame. Finally, the ACS can provide more than identification of people who are in the S&E workforce. Through its paradata, the ACS can also inform the subsequent survey process in ways that would improve the efficiency and quality of the data. For example, ACS mode, number of calls and contact information, and other data about the process could be valuable to the NSCG or other SESTAT surveys that might use the ACS as a sample frame. As the use of the ACS as a sampling frame matures, NSF and the Census Bureau may wish to consider how ACS paradata could be used to improve S&E workforce data collection and analysis. Even without a change in survey content, the use of the ACS opens the possibility of changing the design of the sample frame for the NSCG in some exciting ways. When the field of degree question is added and cur- rent data become available throughout the decade, the range of options expands and the flexibility in NSCG designs expands. OPTIONS NSF identified four primary options (combinations of the options are also possible) that are made possible by the ACS continuous survey approach (National Science Foundation, 2007): the current approach, selec- tive updates, continuous updating, and a rotating sample. The panel also discusses a hybrid approach that was offered during the workshop: a rotat- ing design for rare populations. The advantages and disadvantages of each NSF option and the hybrid approach are discussed in this section. (1) Current Approach ACS data could be used once a decade to draw a new panel for the NSCG. The existing ACS questions are nearly identical to those found on the decennial census long form, and they are suitable for a screening survey for the NSCG as was done using the census long form. The survey procedures could then follow those previously used in the postcensal NSCG. The advantage of this option is that it requires the least amount of organizational change, meaning an easier transition. However, there are several disadvantages. One is that it fails to take advantage of the yearly accumulation of ACS cases. Five years of the ACS yields as many cases as the number gathered by the decennial census long form. Unless multiple years of the ACS are used, the Census Bureau cannot provide the overÂ sample of rare groups (e.g., minorities) that were available on the long- form census samples, and the reliability of estimates for these groups of
USING ACS FOR THE NSCG SAMPLE FRAME 59 interest would suffer. This option would also continue the current peaks and Âvalleys in the funding pattern: Costs will be high for one cycle per decade instead of similar in size for each survey cycle. Conclusion: Replicating the current design is not an efficient way to use the ACS. (2) Selective Updating Design option 1 could be modified by using the ACS in later years of a decade as a frame to update the sample for certain domains whose coverage becomes problematic as the decade progresses (e.g., recent immigrants) or for populations of emerging interest. Data items in the ACS could be used to identify subsets of ACS respondents into a frame for targeted group(s). For example, the question on when a person came to the United States could be used to create a subset that contains recent immigrants. The ACS could be used as a frame to examine âreal-timeâ events (e.g., the rise and fall of technology and information technology firms and the impact on information technology occupational employ- ment). Such supplemental frames for special domains could be sampled during any survey cycle rather than once a decade. ACS data could be analyzed for indicators of meaningful change in categories of interest, such as large increases or decreases in a field, or occupation, or immigration status. Frame updates could be implemented whenever the ACS data signaled there had been a significant change that would warrant an update. The selective updating approach has its downsides. First, it requires a periodic major redesign (such as every 10 years). Second, it requires the draw of a very large sample periodically from the ACS and so it would compete for resources for ACS over the decade. Third, it opens the pos- sibility of data series discontinuity because there will be a break in series whenever the entire sample is redrawn. There are some advantages to this option over the once-each-decade option. It allows updating each survey cycle and thus prevents coverage losses associated with an out-of-date frame. It also allows updating to gather data on emerging issues. On the negative side, it retains the seri- ous cost disadvantages of option 1, and, as an operational drawback, it requires continuous access to the ACS as a sampling frame. Conclusion: The disadvantages of the selective updating design outweigh the potential advantages. However, selective subsamples could be considered to supplement another design to enable the study of subpopulations of emerging interest.
60 USING THE AMERICAN COMMUNITY SURVEY (3) Continuous Sample Updating A fresh sample could be selected from the ACS for each cycle of the NSCG or at least more frequently than once a decade. With a freshly drawn sample, the coverage of the full population of interest would be more current than at present and it would reduce or eliminate the cover- age problems that develop over the decade in the once-a-decade approach, particularly for immigrant scientists and engineers and for nondegreed workers in S&E occupations. This approach would involve screening for eligible scientists and engineers each time a new sample is drawn from the ACS and would require large sample sizes (and thus higher costs) for each survey cycle. This option would pose some operational issues, such as procedures for phasing in any new sample. The total replacement of the sample might not be feasible from a cost standpoint; it might be necessary to phase in the new sample over one or two collection cycles. The advantages of this option make it extremely attractive. It would maintain the currency of the NSCG sample, permit oversampling of emerging or special interest populations during the decade, prevent dis- continuities in the estimates, support trend analysis, and smooth out the NSF budget cycle. A serious disadvantage of this approach is that it would likely require continuous access to the entire ACS sample for all years to derive the desired sample sizes for rare populations. The Census Bureau cannot commit to providing that level of access to the ACS in an ongoing man- ner. Total sample replacement each survey cycle might also not be cost efficient because the NSCG incurs highest per unit costs in its first data collection due to higher tracing and locating costs and the need to screen out people who are ineligible for the study. Drawing new samples more frequently than once a decade would also reduce (or eliminate) the longitudinal feature of the ACS. If the sample were redrawn every survey cycle, the NSCG would become a series of cross-sectional surveys. One result would be considerably more variation in the estimates from cycle to cycle than with the current lon- gitudinal design. This phenomenon would be especially noticeable in important small domain estimates, such as estimates by field by race and ethnic group. It should be recognized that there is considerable risk to NSF in com- mitting the agency to this option. As discussed in Chapter 3, there is no firm guarantee that the ACS would be made available for such sampling. The overall costs would likely be higher than at present because data col- lection and data processing operations would be more expensive due to the need to locate and screen the freshly selected sample.
USING ACS FOR THE NSCG SAMPLE FRAME 61 Conclusion: A freshly selected sample from the ACS in each cycle of the NSCG is not an efficient design, particularly for small popu- lations. If rare populations were to be effectively studied, extensive and continuous use of the ACS sample would be required, which might preclude use of the ACS for other survey purposes. (4) Rotating Sample The ACS affords the opportunity to convert the NSCG to a rotating sample design. Rotation designs are often recommended in longitudinal surveys when there is a problem with sample attrition due to respondent fatigue. With three survey cycles per decade, the NSCG has experienced declining response rates as each decade progressed, as well as increasing refusal rates. The rotating sample approach would offer virtually all of the advantages associated with continuous sample updating, plus some additional advantages. For example, the 2003 NSCG sample could be initially divided into several equal-sized panels. A new panel would be drawn from the ACS to replace one of these NSCG panels. Each survey cycle a new ACS panel could replace an old NSCG panel until the entire 2003 NSCG was rotated out. Then the oldest ACS sample panel could be replaced by a draw from the most recent ACS year(s), one each NSCG survey cycle. This approach incorporates all of the coverage advantages of options 2 and 3 plus the additional advantage that the process of screening to identify scientists and engineers would be spread more evenly over time. The duration of the transition process of phasing in ACS panels could be lengthened or shortened depending on the size of the NSCG panels to be replaced (or replacing multiple 2003 NSCG panels in one or more survey cycles). During the transition phase, a larger draw might be taken the first time. The rotation schedule for the transition to the ACS need not be the same as that established for the longer term once the NSCG consists entirely of ACS panels. To use this design, NSF would need to negotiate with the Census Bureau for assured continuous access to the ACS sample for NSCG frame building. With a sufficient number of panels (say four to five rotating p Â anels) and biennial data collection, NSF should be able to build its sample frame for selecting each cycleâs incoming panel from a random subsample of about 20 to 25 percent of the ACS sample (translating into four or five rotations), which would enable other studies to build valid sampling frames from the remainder. â For a discussion of using multiple frames for the NSCG design, see Fecso et al. (2007a).
62 USING THE AMERICAN COMMUNITY SURVEY There are several advantages to this option. There are obvious cost efficiencies in that replacing only a portion of the sample each survey cycle would smooth out data collection costs across time and avoid ballooning costs once a decade. The 2-year periodicity can be designed to avoid the decennial moratorium by returning to surveying in odd-numbered years. Rotating panels allow retention and accumulation of rare subgroups of special policy interest. Replacing only a portion of the ACS each survey cycle would allow NSCG to build its frame from only a subsample (say, 25 percent) of each annual ACS rather than the entire sample each year. This design would also permit embedding methodological experiments to address quality issues The disadvantages include the fact that rotating panels lead to lower response rates in comparison with a freshly selected sample each cycle, due to survey attrition, panel fatigue, and conditioning effects. And, as with other options, the Census Bureau would need to commit to allowing NSF to sample from a designated subsample of the ACS every year but without the greater perturbations of the sample that would occur under the options that require a once-each-decade (or other frequency) sample draw. Finally, the rotating sample design will limit the ability to do longi- tudinal analysis. Although the rotating panels maintain the capacity to do longitudinal analysis as each panel will have data collected for a specific number of years, the longitudinal data will not be available for the full sample for all time periods. Conclusion: The rotating sample approach is the most promising of all the NSCG design options and a biennial survey cycle with four or five rotating panels is the most efficient and cost-effective use of the ACS as a sampling frame. In addition to the four NSF options (summarized in Table 6-1), the committee considered a hybrid approach suggested by Graham Kalton at the October 2007 workshop. This option would implement a rotating design for rare populations only, while using a cross-sectional strategy for the more populated groups of interest. With this hybrid approach, it would be possible to accumulate a sufficiently large number of sample cases for relatively rare populations to produce reliable estimates and to capture the strength of the large number of sample cases to produce cur- rent estimates. The hybrid approach would have drawbacks. Because most sample units would be refreshed each year, there would be limited ability to fol- low respondents over time, thus limiting the ability to develop a longitu- dinal database. The rare population cases that are followed on a periodic basis would also have problems of sample attrition and might be prone
TABLE 6-1 Summary of Options for Sample Design Option Description Advantages Disadvantages Conclusion Continue current Use the ACS once Requires least amount Fails to take advantage Replicating the current approach each decade for a of organizational of the yearly design is not an efficient sample draw. change, easier to make accumulation of new way to use the power of the transition. cases; limited ability the ACS. to oversample rare populations. Selective updating Use the ACS in later Allows updating each Retains the serious cost Disadvantages outweigh years of the decade survey cycle and limits disadvantages of the the advantages. However, to update the sample losses due to out-of- current approach and selective subsamples could for problematic date frame. requires continuous be considered to study domain coverage. access to the ACS to subgroups of particular draw samples. interest. Continuous sample Drawing a fresh Maintains the currency Requires continuous Not an efficient sample updating sample from the of the NSCG sample, access to the ACS design and would limit ACS for each cycle permits oversampling sample for all years use of the ACS for drawing of the NSCG, at least of special interest and may not be cost samples for other surveys. more frequently than groups during the efficient; would once a decade. decade, supports trend reduce or eliminate analysis. longitudinal feature of the ACS. Rotating sample Refresh the once- Incorporates the Rotating panels lead to This is the most promising per-decade sample cost efficiencies and lower response rates of the design options. A replenishment with advantages of the prior due to sample attrition; biennial survey with four rotating panels. updating options and will require a decision or five rotating panels is permits screening to about whether to bring most efficient solution. identify S&E workers in a new sample all at over time. once or by multiple 63 panels in the transition.
64 USING THE AMERICAN COMMUNITY SURVEY to problems of panel conditioning over time. Comparisons of population groups might be adversely affected by differential time-in-sample effects across the population groups. Conclusion: A hybrid approach using a rotating design for rare pop- ulations would have the drawback of not keeping time-in-sample constant across subpopulations and thus might lead to differential levels of nonsampling bias across subpopulations. Recommendation 6.2: If the National Science Foundation wishes to consider continuation of the National Survey of College Graduates with the sample drawn from the American Community Survey, the agency should use a rotating panel design. TRANSITION PHASE The length and difficulty of the transitioning from a frame based on the census long form to one based on the ACS will be dictated by the design option that is selected. The transition process is critically impor- tant and the planning for a transition phase needs to begin almost imme- diately. For example, if data collection is to be initiated in 2011, sample cases from the 2009 and 2010 ACS would be required. Indeed, it would make sense, given the relatively limited size of the ACS for rare popula- tions, to retain a portion of the 2008 NSCG panel as a carryover panel to supplement the ACS sample draw. There is precedence for this, as use of a carryover sample was a part of the post-2000 decennial census sample design also. Recommendation 6.3: The National Science Foundation should work with the Census Bureau to develop plans for using the Amer- ican Community Survey as a sampling frame for a transitional period as well as for the continuing design. ACS PROCESSING STEPS: SWAPPING AND IMPUTATION In the course of discussing NSCG design options in the panelâs Octo- ber 2007 workshop, the issues of data swapping and imputation and their effects on the sampling process emerged. These technical processes are regularly used by the Census Bureau, and they have an effect on the efficiency and accuracy of the estimation of survey values. The Census Bureau uses the technique called data swapping to create public-use datasets (a decision based on their overall disclosure policies). Data swapping is a technique for ensuring data confidentiality in which,
USING ACS FOR THE NSCG SAMPLE FRAME 65 during processing of the survey data, records are exchanged for a subset of cases by selecting a sample of households, matching them on a set of selected key variables with households in neighboring geographic areas that have similar characteristics (such as the same number of adults and the same number of children), and swapping data elements. If the swapped data are used to produce estimates, there is little effect on the data since the swap usually occurs within a neighboring area so as to have no effect on the marginal totals for the area. But if the swapped data are used to identify households or people for sampling for the NSCG, the use of swapped data could greatly reduce stratification efficiency. The committee favors use of the edited ACS file, before swapping, for weighting and creation of the NSCG sampling frame even though the use of unswapped data may mean that a customized weight would have to be developed if only the ACS base weight is available at the time that NSCG frame is built. The problem of loss of stratification efficiency holds sway. There is precedence for the use of unswapped data as the Census Bureau allowed use of unswapped data from the 2000 Decennial Census long form for sampling for the 2003 NSCG. Recommendation 6.4: The Census Bureau should use unswapped American Community Survey data (with sample weights) for draw- ing a National Survey of College Graduates sampling frame. Another technical concern is the use of imputed data from the ACS. The panel concludes that imputed educational attainment level data (labeled âallocated dataâ by the Census Bureau) should not be used for sampling. Imputed data creates an unacceptable amount of undercover- age of those with a bachelorâs degreeâestimated at 3 to 7 percent accord- ing to Finamore, Hall, and Fecso (2006)âas well as sampling inefficiency because, in some cases, those with an imputed education level of a bach- elorâs degree could turn out not to have a bachelorâs degree. To assist in arriving at an informed decision on imputation, the committee urges that records that have imputed an education level should be put aside prior to sampling and a small sample of these ACS cases should be sampled to collect actual records, i.e., documentation of claimed degrees, in order to measure the data quality and undercoverage. Adding a field-of-degree question to the ACS would create an entirely new issue related to imputation. It is unclear how imputation should be done for missing field-of-degree information. For individuals with an S&E or S&E-related occupation, field-of-degree imputation might perform well. For other occupations, it is not obvious that an acceptable imputa- tion model can be developed. It may be that such cases will need to be treated as missing and reweighted. Depending on the severity of the
66 USING THE AMERICAN COMMUNITY SURVEY problem, special attention could be focused on following up on missing field-of-degree responses in the research program envisioned in Recom- mendation 5.2, above. Recommendation 6.5: The National Science Foundation and the Census Bureau should initiate a program of research on imputa- tion and nonresponse treatment for missing field-of-degree and education-level responses. ACCESS TO THE ACS SAMPLE FRAME The recommended option of a rotating panel design for the NSCG does not come without risk. Although this design would eliminate peri- odic demands for a very large sample from the ACS because the entire sample will not be redrawn every cycle, it would require several draws during the decade in contrast to just one at the beginning of the decade. It thus would require assured access on a continuous basis to a subsample of the ACS sampled cases. The Census Bureau has indicated reluctance to guarantee continuous access to the entire ACS (personal communication, Howard Hogan, U.S. Census Bureau). Assured access to a subsample of the ACS may be possible. The reluctance of the Census Bureau is based on constraints that are faced in allocating access to the ACS as a sample frame for other sur- veys, which also involves a number of unknowns. This constraint will not pose a problem for 2009, as the Census Bureau is not aware of any other potential requests for access to the ACS as a sample frame in that year. However, there is discussion of using the ACS as a sampling frame for the National Epidemiologic Survey on Alcohol and Related Condi- tions (NESARC) in 2010 and thereafter, although the design and sample requirements for that survey have not yet been specified. Several other survey operations are known to be considering use of the ACS as a frame, including the Survey of Income and Program Participation (SIPP), the American Housing Survey (AHS), and a possible new health survey. In anticipation of a situation in which multiple surveys are vying for access to the frame, the Census Bureau has developed and promulgated a policy on using the ACS as a frame for reimbursable follow-on surveys (U.S. Census Bureau, 2007, p. 2). The policy includes provision for informing ACS respondents of the possibility of being included in follow-on surveys and a priority scheme that stresses reduced costs and the difficulty of screening for rare populations. This major problem for the Census Bureau stems from the Bureauâs policy that precludes an ACS household that has been selected into the sample of one non-ACS survey from being contacted again for another
USING ACS FOR THE NSCG SAMPLE FRAME 67 non-ACS survey. Under this policy, the selection of an individual for the NSCG will exhaust the eligibility of the whole household for further survey contacts. Therefore, it becomes more difficult to draw a nationally representative household sample for other surveys after the sample for the NSCG has been drawn. To the extent that the NSCG will oversample rare populations, the possibility of having sufficient sample units for these smaller groups is further constrained. There are some technical fixes to this problem that can be explored. NSF and the Census Bureau are considering the possibility of allocat- ing access to the ACS sample by month. Other procedures to enable the NSCG sample draw while preserving sample for other surveys may well be developed with additional research. Recommendation 6.6: The National Science Foundation and the Census Bureau should sponsor a research program to explore means of permitting a sample draw from the American Community Survey for a rotation panel for the National Survey of College Graduates while preserving American Community Survey sample units for other surveys.