Read "Using the American Community Survey for the National Science Foundation's Science and Engineering Workforce Statistics Programs" at NAP.edu

« Previous: 2 The Scientists and Engineers Statistical Data System

Page 26 Cite

Suggested Citation:"3 The National Survey of College Graduates." National Research Council. 2008. Using the American Community Survey for the National Science Foundation's Science and Engineering Workforce Statistics Programs. Washington, DC: The National Academies Press. doi: 10.17226/12244.

Page 27 Cite

Page 28 Cite

Page 29 Cite

Page 30 Cite

Page 31 Cite

Page 32 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

3 The National Survey of College Graduates T he National Survey of College Graduates (NSCG) is the one National Science Foundation (NSF) survey most likely to be directly affected by the new American Community Survey (ACS); consequently, the panel paid special attention to it in its work. It is expected that the NSCG will be subject to substantial changes in the next several years, and change has been frequent for the NSCG since its inception in 1962; changes in sample design and content have been made with some frequency. HISTORY AND DESIGN The NSCG survey began in 1962 when NSF and other agencies spon- sored a single, cross-sectional survey (the Postcensal Manpower Survey), with a sample derived from the long form of the 1960 decennial census to collect information on science and engineering personnel resources. A decade later, NSF sponsored the Professional, Technical and Scientific Manpower Survey, again drawing the sample from the decennial census, and the agency introduced smaller follow-up surveys using the same sample through 1978. This pattern was continued in the 1980s, when NSF again conducted a postcensal survey with follow-ups through 1989. The survey that is now known as the NSCG emerged after a major redesign following the 1990 census. The post-1990 design continues an earlier data collection strategy of a large postcensal (baseline) survey, with smaller follow-up surveys during the remainder of the decade. Since â The redesign was largely based on recommendations in a report of the Committee on National Statistics (National Research Council, 1989). 26

THE NATIONAL SURVEY OF COLLEGE GRADUATES 27 then, the baseline decennial NSCG has served two purposes: to provide a once in a decade view of all college graduates in the United States and to act as a screening device (through detailed educational histories col- lected in the NSCG) for obtaining a sample of scientists and engineers for the integrated Scientists and Engineers Statistical Data System (SESTAT) file. The baseline was necessary because the decennial census long form contained information only on educational attainment, so it was not pos- sible to identify people with science and engineering degrees. Thus, the NSCG has a long history in which the Census Bureau has created a sampling frame based on responses to the decennial census long form at the beginning of each of the last four decades and has drawn a baseline NSCG sample from that sample frame. The baseline sample con- sists of long-form respondents with a bachelorâs degree or higher at the time of the census. Because field-of-degree information was not available on the long form, occupations were used to begin the process of identify- ing respondents for the NSCG. To capture the entire stock of scientists and engineers, long-form respondents from both science and engineering (S&E) occupations and non-S&E occupations with a high likelihood of being held by someone with an S&E degree were given a chance of selec- tion into the NSCG sample. This additional group was included because a high proportion of people with S&E or S&E-related degrees do not work in S&E or S&E-related occupations. They were either working in a non- S&E occupation or were not working. As a result of using this occupation- based sample design rather than a field-of-degree-based sample design, the NSCG is the only source of information for the SESTAT integrated database that cross-classifies people with non-S&E degrees by whether they work in S&E or S&E-related occupations. These cross-classifications are shown in Table 3-1. The postcensal NSCG has used a reasonably complex, two-stage, ran- dom sample design. In the first stage, households are sampled from the census long-form sample using a stratified systematic sample, with dif- fering sampling rates for administrative areas of different sizes (sampling rate of between 1 in 12 and 1 in 16). The second stage subsampled people from within those households who are in the target population. The census long form yielded the several major sampling variables used to create the strata for the frame. In 2003, these variables were edu- cational attainment (bachelorâs degree or higher) by highest degree level achieved, occupation, demographic group (which combines citizenship, race and ethnicity, and disability status), and gender. Within each stratum, individuals were selected using probability-proportional-to-size (PPS) systematic sampling. Weighting was facilitated by the fact that the long- form sampling weight was used as the size measure for selection. This approach compensated as much as possible for the differing long-form

28 USING THE AMERICAN COMMUNITY SURVEY TABLE 3-1â Degree Field and Occupation, 2003 NSCG Respondents Degree/Occupational S&E S&E-Related Non-S&E Not Status Occupation Occupation Occupation Working Total At least one S&E 22,669 6,676 13,959 7,877 51,181 â degree No S&E degree but 1,135 5,637 2,130 1,623 10,525 â at least one S&E- â related degree No S&E or S&E- 2,897 1,901 26,020 7,878 38,696 â related degree Total 26,701 14,214 42,109 17,378 100,402 SOURCE: National Science Foundation (2007, p. 7). sampling rates and came close to establishing an overall self-weighting sample within each of the above second phase strata. Additional precision in determining eligibility for the follow-up NSCG surveys throughout the decade is afforded by data collected in the postcensal NSCG baseline survey. The major item that has been added by the baseline survey is the field of degree. Thus, the sampling variables for the follow-up surveys have included the field of highest S&E degree as well as the original sampling strata. LIMITATIONS OF THE CENSUS LONG FORM AS THE SAMPLING FRAME The fact that the NSCG is derived from the decennial census has vexed some users of the survey over the years. Access to the raw data, important for both understanding the quality of the data and for ana- lytical uses, is severely limited because records derived from confidential decennial census records are protected by Title 13 and can be used only under specific Census Bureau supervision. Another major issue has been the lag in timing of the availability of the NSCG data because it is linked to the decennial census. Because of the time needed to process the decennial census and make the data available for NSCG sampling, the postcensal baseline NSCG has generally been fielded about 3 years after the decennial census. These issues are endemic to the operation of the decennial census and the result of long-standing practices. A recent study for NSF highlighted several other sample selection and coverage problems related to the content of the decennial census long

THE NATIONAL SURVEY OF COLLEGE GRADUATES 29 form (Fecso et al., 2007a), including efficiency, missing groups, declining response rates, and loss of historical continuity. Efficiency The census long form has been an inefficient means for identifying those with S&E degrees mainly because of the lack of information that would allow identification of those with science, engineering, and health degrees. As shown in Table 3-2, this has been a historical problem. In 1993, a selection of about 215,000 individuals for the NSCG sample from the decennial long-form sample frame yielded only about 75,000 cases that met NSFâs definition of a scientist or engineer and therefore were eligible for the SESTAT integrated database and the NSCG follow-up surveys. The efficiency of the process was slightly improved after the 2000 census even though the target population was expanded to include S&E- related degrees and occupations. In 2003, the 171,000 people selected from the 2000 census long-form sample frame yielded 67,000 cases with S&E and S&E-related degrees or occupations. Despite this slight improvement, the process of identifying the target population in the absence of a field- of-degree question can only be described as inefficient. There was one positive side effect of this sampling inefficiency. Using the postcensal survey as a screening mechanism made possible valuable comparisons of scientists and engineers with non-S&E degree holders. However, this comparison was only possible once in a decade (the year of the postcensal survey) because non-S&E individuals were not part of the follow-up sample frame. TABLE 3-2 Yield of SESTAT-Eligible Cases from the 1993 and 2003 NSCG Characteristic 1993 NSCG 2003 NSCG Sample Size 214,643 170,800 Respondents 148,905 100,402 SESTAT-eligible 74,462 66,504 Ratio of sample size to usable cases 2.88:1 2.56:1 Ratio of respondents to usable cases 2.00:1 1.51:1 NOTE: The definition of SESTAT-eligible was expanded between 1993 and 2003 to include people with S&E-related degrees or occupations. SOURCE: National Science Foundation (2007, p. 8).

30 USING THE AMERICAN COMMUNITY SURVEY Group Coverage Using a decennial census to identify the stock of engineers and sci- entists to be interviewed over the decade and supplementing it with new graduates of U.S. institutions in S&E fields from the National Survey of Recent College Graduates (NSRCG) and the Survey of Earned Doctorates (SED) inevitably means that some population groups were missed. One population that is of great interest are the scientists and engineers whose degrees were all earned abroad. This population is captured in the sample only once a decade in the baseline survey. Foreign-educated scientists and engineers entering the United States after the decennial census and receiving no further degrees in the United States are not included in any SESTAT survey, so the undercoverage of this group grows throughout a decade. Another group that is partly covered in the postcensal NSCG but not in later surveys is people with non-S&E degrees who enter S&E or S&E- related jobs after the postcensal NSCG. This is an important omission in the case of computer occupations, which include a significant number of workers not educated in a science, engineering, or related discipline who have moved into computer-related occupations. These omissions are exacerbated because a substantial number of sci- entists and engineers are both non-S&E graduates in S&E and S&E-related occupations and foreign educated. In a report, NSF estimates that in 2003 there were over 720,000 people in S&E occupations and nearly 790,000 people in S&E-related occupations with non-S&E degrees (National Sci- ence Foundation, 2007). Additionally, there were estimated to be close to 1.5 million people in the SESTAT population who had only foreign degrees. Taking into account the overlap between these two popula- tions, approximately 2.6 million people in 2003 in the SESTAT population worked in an S&E occupation but had no S&E degree or had only a for- eign degree. Such people represent approximately 12 percent of the 2003 SESTAT population of 21.6 million people. Response Rates Another problem in using the census long form as the sampling frame is increasing cumulative nonresponse through the decade. Nonresponse is a major concern with the current NSCG design since the sample is only refreshed once a decade. Although follow-on surveys later in the decade â This number excludes those who graduated in non-S&E fields after April 1, 2000, who were working in S&E or S&E-related occupations in 2003 as well as those with only foreign degrees who were not in the United States at the time of the decennial census but were here working in an S&E or S&E-related occupation at the time of the 2003 NSCG.

THE NATIONAL SURVEY OF COLLEGE GRADUATES 31 generally have had very good response rates (well above 90 percent), the total attrition in the sample over the decade is substantial. The decade of the 1990s provides an example. As shown in Table 2-1, the unconditional unweighted response rates for the 1990 decennial sample went from an initial rate of 78 percent to 74, 70, then 63 percent over four survey cycles. The problem of growing nonresponse appears to be increasing in the 2000s. In 2003, the NSCG had a response rate of 63 percent. By 2006, the unconditional response rate had fallen to 55 percent. The declining unit response rates are particularly troublesome because they vary dramatically across demographic, citizenship, educational attainment level, and age groups. Non-Hispanic white individuals are more likely to respond than individuals in other racial and ethnic groups. U.S. citizens respond at a higher rate than non-U.S. citizens. Higher edu- cational attainment levels directly relate to higher rates of response. Longitudinal Continuity One final difficulty posed by reliance on the decennial census is that the usual practice of discarding the old sample every 10 years brought about the complete loss of longitudinal continuity and a lack of informa- tion about how nonresponse adjustments during the decade might cause a shift in the time series. NSF addressed these issues by embedding an experiment in the design of the 2003 NSCG. In addition to drawing a new sample from the 2000 decennial long-form sample, NSF also included the remaining 1999 NSCG respondent population (which included cases originally sampled in the 1993 NSCG, as well as the 1995-1999 NSRCG surveys) to receive the 2003 survey. This experiment found some large differences in estimates of the scope of coverage between various nonresponse adjustment cells made from newly drawn 2000 postcensal samples in comparison with retained longitudinal samples from the 1999 NSCG in 2003 (Finamore, Hall, and Fecso, 2006). It is believed that some of the difference could be caused by increasing nonresponse across key groups that is not ignorable. Further research is required to determine all the factors that may have contributed to the differences. COMPARING THE LONG FORM AND THE ACS In view of the above well-known limitations of the census long form as a sample frame, NSF commissioned reviews of potential sampling frames and designs by previous National Research Council panels. Each time, the reports found that the design based on the census long-form

32 USING THE AMERICAN COMMUNITY SURVEY sample for the NSCG was the best available strategy (National Research Council, 1989, 2003). Most recently, in preparation for the NSCG surveys in the 2000s, NSF explored alternative sampling frames for SESTAT. It looked for a frame that could provide a more complete representation of the universe of sci- entists and engineers than the long-form sample approach (Fecso et al., 2007b). No suitable alternative to the long-form frame for the NSCG was identified, primarily because no other source had sufficient sample size to include a large enough number of scientist and engineers, a relatively rare population, to meet the needs of the NSCG and SESTAT. The ACS was long ago identified as a future potential alternative to the census long form. Now that the ACS has been successfully imple- mented, the Census Bureau has agreed to permit use of the ACS as a sample frame for the NSCG in the future. This introduces a host of oppor- tunities as well as some major challenges. Some aspects of the sample design based on the decennial census would not need to change much in a transition to an ACS-based design. For example, it would be possible for NSF to draw the sample from a list of ACS respondents using criteria similar to those used in past NSCG surveys. That approach will be facilitated by the fact that the ACS now collects information that is essentially identical to that collected on the long formâthe highest degree or level of school that the respondent has completed, occupational and employment characteristics, and demo- graphic characteristics. However, some things will need to change. The ACS surveys a smaller number of households in a given year than were surveyed by the long form. Consequently, it will be necessary to accumulate 2-3 years of ACS households in order to identify a set of households that could serve as a sufficient sample frame for the NSCG. This change introduces complica- tions that are more fully explored in Chapter 6. The potential for more substantial change during the shift to the ACS is embedded in the plan to add a question on the field of a bachelorâs degree to the ACS on an ongoing basis, assuming successful completion of a full-scale field test of two alternative question versions. With this question, it will be possible not only to enhance the ability of the Census Bureau to identify respondents with the characteristics of interest for sampling for the NSCG, but also to provide a base of information, both in cross-section and in time series, on the population of college gradu- ates by field of bachelorâs degree. The data should have benefits to many federal agencies, particularly those with responsibility for assessing such issues as educational attainment, immigration, and public welfare, and for projecting occupational supply and demand. A further discussion of this new potential is presented in Chapter 7.

Next: 4 The ACS and the SESTAT Program »

Using the American Community Survey for the National Science Foundation's Science and Engineering Workforce Statistics Programs (2008)

Chapter: 3 The National Survey of College Graduates

Welcome to OpenBook!

Get Email Updates