then, the baseline decennial NSCG has served two purposes: to provide a once in a decade view of all college graduates in the United States and to act as a screening device (through detailed educational histories collected in the NSCG) for obtaining a sample of scientists and engineers for the integrated Scientists and Engineers Statistical Data System (SESTAT) file. The baseline was necessary because the decennial census long form contained information only on educational attainment, so it was not possible to identify people with science and engineering degrees.
Thus, the NSCG has a long history in which the Census Bureau has created a sampling frame based on responses to the decennial census long form at the beginning of each of the last four decades and has drawn a baseline NSCG sample from that sample frame. The baseline sample consists of long-form respondents with a bachelor’s degree or higher at the time of the census. Because field-of-degree information was not available on the long form, occupations were used to begin the process of identifying respondents for the NSCG. To capture the entire stock of scientists and engineers, long-form respondents from both science and engineering (S&E) occupations and non-S&E occupations with a high likelihood of being held by someone with an S&E degree were given a chance of selection into the NSCG sample. This additional group was included because a high proportion of people with S&E or S&E-related degrees do not work in S&E or S&E-related occupations. They were either working in a non-S&E occupation or were not working. As a result of using this occupation-based sample design rather than a field-of-degree-based sample design, the NSCG is the only source of information for the SESTAT integrated database that cross-classifies people with non-S&E degrees by whether they work in S&E or S&E-related occupations. These cross-classifications are shown in Table 3-1.
The postcensal NSCG has used a reasonably complex, two-stage, random sample design. In the first stage, households are sampled from the census long-form sample using a stratified systematic sample, with differing sampling rates for administrative areas of different sizes (sampling rate of between 1 in 12 and 1 in 16). The second stage subsampled people from within those households who are in the target population.
The census long form yielded the several major sampling variables used to create the strata for the frame. In 2003, these variables were educational attainment (bachelor’s degree or higher) by highest degree level achieved, occupation, demographic group (which combines citizenship, race and ethnicity, and disability status), and gender. Within each stratum, individuals were selected using probability-proportional-to-size (PPS) systematic sampling. Weighting was facilitated by the fact that the long-form sampling weight was used as the size measure for selection. This approach compensated as much as possible for the differing long-form