Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
4 The ACS and the SESTAT Program T he American Community Survey (ACS) promises to have a pro- found effect on the Scientists and Engineers Statistical Data System (SESTAT) Program. Most directly, it will take the place of the cen- sus long form as the sampling frame for the National Survey of College Graduates (NSCG). But the ACS will more significantly change SESTAT if, as seems likely, a field-of-degree question is included in the ACS on an ongoing basis. This chapter discusses the potential of the ACS for chang- ing SESTAT and proposes several options for the future of SESTAT. THE ACS AS THE SAMPLING FRAME The questions on the ACS are generally identical to the questions that were on the decennial long form. The most important difference in the two surveys is that the ACS can provide reasonably detailed information about households and individuals each year rather than once a decade. The ACS is conducted every month. Estimates for the nation and large areas are produced annually from aggregating the monthly samples; for subnational estimates, the data are aggregated over longer time periods. The ACS takes a new sample of about 250,000 addresses each month, â Information about the ACS can be found at: http://www.census.gov/acs [accessed Feb- ruary 2008]. â The ACS questionnaire can be found at: http://www.census.gov/acs/www/SBasics/ SQuest/SQuest1.htm [accessed February 2008]. 33
34 USING THE AMERICAN COMMUNITY SURVEY or a total of 3 million annual households. Over a decade, the ACS will survey approximately 30 million addresses; for comparison, 17 million housing units were surveyed by the long form at one time in the decen- nial census. A key function of the ACS is to produce estimates for various levels of geography (from small areas to the total nation) and other population groupings. The ACS provides estimates annually for areas (and popula- tion groups) of 65,000 or more people; these estimates are scheduled to be made available in the summer or early fall for the previous yearâs sample. To attain similar reliability to that provided for some of the small groups in the 2000 decennial census, the ACS estimates for the smallest areas or population groups must be based on data aggregated over 5 years. The problem of the reliability of data for the smallest areas (such as counties) presents an equivalent statistical problem to the problem of the reliability of estimates for small (rare) subpopulations (small domain estimates), such as scientists and engineers, in terms of sample size con- siderations. Both small-area and small-domain estimates are subject to insufficient sample sizes to produce sufficient reliability. The National Science Foundation (NSF) faces a reliability problem in using the ACS as the NSCG sample frame not because it wishes to produce small-area esti- mates, but because it needs the ACS sample size for rare populations. A recent National Research Council (2007) report, Using the American Community Survey: Benefits and Challenges, points out that there are some important differences between the ACS and the decennial long-form cen- sus. One, such difference is that ACS data products are 1-year, 3-year, and 5-year period estimates that average 12, 36, and 60 months of data, respec- tively. In contrast, the 2000 long-form sample of more than 16 million responding households obtained data for one fixed timeâCensus Day, April 1. In comparison with the long-form sample, the report suggests that the ACS has three major benefits: 1. ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ Timeliness: ACS data products are released 8-10 months, instead of 2 years, after data collection. 2. ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ Frequency: ACS data products are updated every year instead of every 10 years, which will make it possible in many areas to track trends in population characteristics that are important for understanding the science and engineering (S&E) workforce. 3. Quality: Higher quality of the data in terms of completeness of ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ response to the survey items. The higher response rates for the â No address will be included in the ACS sample more than once in a 5-year period. â Beginning in 2006, this information will be made available annually in late summer or early fall for the previous yearâs sample.
THE ACS AND THE SESTAT PROGRAM 35 ACS compared with the 2000 long-form sample is achieved by the use of more intensive methods of data collection by Âbetter trained interviewers. The ACS is conducted using an initial mailâout, mail-return, self-response questionnaire. The first Â follow-up to mail nonresponse is conducted by computer-assisted telephone interview (CATI); it is followed by a computer-assisted per- sonal interview (CAPI) of a subsample of the remaining nonÂ respondents. The ACS interviewers are experienced and highly trained in contrast to the lightly trained temporary enumerators that were used for nonresponse follow-up in the 2000 census. The professional, fully trained Census Bureau interviewers have access to built-in computer edits and questionnaire routing soft- ware in the CATI and CAPI instruments, and so they obtain more complete data. Having more complete data means that there is less need for imputation of missing responses to questionnaire items. On the negative side, the National Research Council report points out that a weakness of the ACS is the significantly larger margins of error in its estimates, even when cumulated over 5 years. The primary reason for this outcome is the much smaller sample size of the ACS. Another reason is the greater variation in the ACS sample weights resulting from the smaller number of sample units available after subsampling for field interviewing of households not responding by mail or telephone. Also, the postcensal population and housing estimates used as survey controls in the ACS are less effective than the full census controls used with the long-form sample. These estimates are subject to unmeasured estima- tion error for which there is little information about magnitude; they are applied at a less detailed level than the census controls; and they are not directly related to the ACS in the way that the census controls are related to the long-form sample. The sampling frame for the ACS is the Census Bureauâs Master Address File (MAF), which will be updated throughout the decade to keep it current. The monthly samples are distributed throughout the country with no area or other cluster sampling, but there will be higher sampling fractions in small governmental units, such as small counties. â The quality improvements inherent in converting to the ACS are substantial. For example, in comparison with the census long form, a precursor survey to the ACS (the Census 2000 Supplementary Survey) had lower imputation rates for 48 of 54 population items. For one item, weeks worked last year, the need for imputing missing values fell from 19.3 percent for the long form to only 9.6 percent for the ACS precursor survey (National Research Council, 2007, p. 57).
36 USING THE AMERICAN COMMUNITY SURVEY OPTIONS FOR ACHIEVING SESTAT PROGRAM GOALS The replacement of the decennial census long form with the ACS offers an opportunity for NSF to meet its stated goals and objectives for the SESTAT Program. These goals and objectives were presented to the panel at its workshop in October 2007, and are summarized in Box 4-1. Significant improvements in timeliness will likely be achieved by the conversion to the ACS. As noted above, by using the ACS, NSF could pub- lish estimates for many key data items less than a year after the reference period. Although data on rare populations, with minimal variance, will be delayed for up to 5 years to accumulate a large enough sample, after the first 5-year delay the data will be available on a flow basis in each fol- lowing year. Analytical power will be increased with the addition to the ACS of the field-of-degree question on an ongoing basis (see Chapter 5). The cost implications of the conversion are discussed below. The ability of NSF to maintain consistent cross-sectional data and preserve the trend in a time series when converting from the long form to the ACS will depend on how NSF decides to implement the change from the long form to the ACS. In some of the options being considered (see below), cross-sectional time series can be strengthened. Trend preserva- tion can also be assured by careful implementation and by developing âbridgesâ from the old data series to the new when it is decided that the new data series is an improvement over the old. For example, one such bridge could be for estimates of the disabled S&E workforce if it is decided to adopt the well-researched and tested ACS (standard) definition of disability rather than the definition of disability now in the NSCG. The conversion from a decennial long-form-based sampling frame to an ACS-based sampling frame affords an opportunity to reconsider the goals and objectives of this major government data collection program on S&E. In Chapter 7, the committee suggests that NSF conduct such a BOX 4-1 NSF Goals and Objectives for the SESTAT Program â¢ Improve timeliness â¢ Maintain coverage of rare populations with minimal variance â¢ Gain analytical power â¢ Maintain cross-sectional time series â¢ Preserve trend (minimize breaks in time series) â¢ Manage costs
THE ACS AND THE SESTAT PROGRAM 37 review, with the caveat that changing the SESTAT Program should be approached gingerly. Proper consideration should be given to the needs of all interested stakeholders and should include plans for transitional and short-term program changes and long-run program modernization. In the short run, for a transitional period, practical constraints would seem to dictate that the SESTAT Program would remain mostly unchanged. This conservative approach is justified because care should be taken when collecting data to understand time trends and to preserve, to the extent possible, historical continuity. Over time, as the ACS settles into an ongoing mode and responses to the new field-of-degree question become understood, new opportunities to replace some aspects of the cur- rent SESTAT Program with more streamlined data collection procedures may emerge. To prepare for these opportunities, the NSFâs Division of Science Resources Statistics (SRS) has appropriately begun to develop a research program that focuses on options for change. An agency report (National Science Foundation, 2007) outlines several options for SRS research efforts, all of which affect the types of data that SRS may wish to gather. The panel has reviewed the NSF staff report, and the remainder of this chapter presents our view of three potential options to guide an SRS research program. The three options are configured here so that Option B is more expansive than Option A, and Option C is more expansive than Option B. Under Option A, SRS would focus primarily on the congressionally mandated reports. Non-SESTAT sources, including the ACS and the Cur- rent Population Survey (CPS) could perhaps be used for the production of those mandated reports. The SESTAT components currently used for the purpose of producing data for the mandatory reports, primarily the NSCG and the National Survey of Recent College Graduates (NSRCG) could be reconfigured, conducted less frequently, or eliminated. The ACS and CPS would need to be augmented by the existing doctoral surveys (such as the Survey of Doctorate Recipients, SDR) and, perhaps, occa- sional NSF-commissioned special surveys that could be funded by using the financial savings that accrue from changing the nature of, or by elimi- nating the NSCG and the NSRCG. This option would require that the CPS add a field-of-degree questionâa change discussed in Chapter 7. This option obtains part of its justification from the fact that the ACS can provide a large sample of workers with bachelorâs degrees in S&E fields in an extremely timely manner. The sample would naturally include individuals with degrees from non-U.S. institutions who are living in the United States. This option would free up significant resources for other, more specialized surveys or for research on the S&E workforce.
38 USING THE AMERICAN COMMUNITY SURVEY However, there would also be several significant opportunity costs in connection with Option A: â¢ People who have a bachelorâs degree in a non-S&E field who sub- sequently obtain a masterâs degree in a S&E field would be mis- takenly classified as non-S&E respondents because of the focus on a bachelorâs degree. â¢ There will be no way to learn specifically about people with S&E bachelorâs degrees who subsequently get a masterâs degree in a non-S&E field (often a business discipline). This sizable group is of great interest to NSF. â¢ If the field-of-degree question on the ACS were categorical rather than open ended, there could be a substantial loss in the useful- ness of information about actual fields of degree. Currently, the NSCG distinguishes among more than 140 different fields; mov- ing to only seven or eight broad categories would drastically limit the specificity of the data and would probably preclude doing meaningful statistical analysis with only ACS data. â¢ Information about the highest degree a respondent has would be solely from the ACS. â¢ The additional items that are now placed on the NSCG question- naire would be lost to researchers, both those at and outside NSF. Option B is more closely aligned with the current approach. Under this option, NSF would continue to produce a longitudinal version of the NSCG survey for the S&E population along the lines of the current NSCG and SESTAT operations, using the ACS for the sampling frame. The panel envisions that this data collection effort would be stratified to oversample women, the disabled, and minority S&E respondents relative to nonblack, non-Hispanic males. NSF could supplement the data from NSCG and NSRCG with data from the ACS and CPS to enrich the man- dated indicator reports. This option comes with a cost that is relatively easy to estimate. The option would require the Census Bureau to recontact ACS respondents in order to complete subsequent interviews. The secular decline in response rates and the difficulty that the Census Bureau has had in obtaining coop- eration in previous SESTAT surveys contributes to the expense of running such a survey and to the obvious attrition problems in the data. The continued cost of conducting the NSCG survey is counterbal- anced by a number of benefits: â¢ There would be better coverage of those who obtain non-S&E bachelorâs degrees and work in S&E occupations. The panel
THE ACS AND THE SESTAT PROGRAM 39 notes that in the past the NSCG sampled all holders of bachelorâs degrees and above and so these respondents could be readily identified. With the ACS as the sampling frame, the issue of how many such people to survey will have to be considered before the survey. This group can be very expensive to sample because of their low frequency and the transitory nature of occupational assignments. â¢ There would be coverage of the S&E bachelorâs degree holders who obtain advanced degrees in a non-S&E field. This may be a group of considerable interest to NSF if they have âleftâ the S&E workforce. (Many of these are combining their S&E skills with other kinds of expertise.) â¢ There would be more detailed coverage of foreign degree holders. Currently, the ACS design does not allow researchers to deter- mine definitively whether the respondent obtained a degree in the United States or another country. â¢ The survey could include questions not on the ACS. â¢ If the Census Bureau merged information from the ACS and the NSCG, NSF and other researchers would have valuable informa- tion about the income of the S&E workforce. The cost-benefit tradeoffs of Options A and B are a matter for NSF staff to determine. The panel notes, however, that it appears that much of the content of the congressionally mandated reports could be generated from ACS data alone when the ACS has a field-of-degree question, and research outside of NSF using the NSCG data seems to be quite limited. Under Option C, in addition to the collection of the longitudinal data version of the NSCG for the S&E population, NSF would commission collection of data on the non-S&E population so that its staff and other researchers can compare outcomes of the S&E population to the non-S&E population. Under this option, S&E respondents would be oversampled relative to non-S&E respondents and women, disabled, and minority respondents would be oversampled relative to those who are nonblack, non-Hispanic, nondisabled males. It would also be possible to focus the oversample on subsets of the S&E population, based on field of degree (see Chapter 5), immigration status, or age. This option is, in many ways, the most ambitious. Under this option, the current NSCG would be continued (though perhaps be smaller), and additional data would be collected to address issues relevant to the NSF goals and objectives. For example, it would be useful to systematically collect data on the non-S&E workforce for purposes of comparison with the S&E workforce. More generally, Option C places a burden on NSF to determine what the large unresolved issues are in the study of the S&E
40 USING THE AMERICAN COMMUNITY SURVEY workforce and to construct data resources that will allow these issues to be addressed. The move to a sampling frame from the ACS makes the current SESTAT transition period an important decision time for NSF. It is an opportunity to review the SESTAT Program goals, as well as the needs and wishes of the outside research community, to determine the type and frequency of data collection. There are many options and this chapter has briefly discussed three of them. The central point is that because of the improved quality of information that the ACS with a field-of-degree question would provide, NSF now has a window of opportunity to decide whether the expense of separate surveys such as the NSCG and the NSRCG is justified. In summary, it is clear that the NSF staff will have an opportunity to rethink the NSCG in light of the added information resources available through the ACS with the field-of-degree question. Given the speed at which the Census Bureau makes the ACS data available, the NSF staff will undoubtedly want to make use of the ACS data in the preparation of the congressionally mandated reports, no matter which option is chosen. Continued use of the NSCG or something like it would involve additional costs, but it would provide for the greatest continuity and provide much more detailed information about the experiences of the S&E workforce. The ACS frame also makes possible alternative approaches, such as a reconstituted NSCG that may have fewer respondents but a richer set of data.