Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
2 The Scientists and Engineers Statistical Data System F or the most part, this report focuses on the National Survey of Col- lege Graduates (NSCG). The NSCG is the survey that will be imme- diately and significantly affected by the switch to the American Community Survey (ACS) as a sample frame and will benefit from the addition of a field-of-degree question on the ACS. However, the NSCG is nested in a group of three surveys that comprise a carefully constructed system of information on the science and engineering workforce, the Sci- entists and Engineers Statistical Data System (SESTAT). Consequently, the issues associated with the conversion to the ACS as a sample frame must be considered in the larger context of SESTAT. In this chapter we describe the SESTAT data system. We then turn to a discussion of mandatory requirements and user needs for the SESTAT data, and discuss data elements and series that are of special interest and that should be taken into account when designing a SESTAT data system for the future. SCOPE OF SESTAT The SESTAT surveys include the NSCG, the National Survey of Recent College Graduates (NSRCG), and the Survey of Doctorate Recipients (SDR). These three large surveys, with more than 100,000 total respon- dents drawn from separate sampling frames, cover more than 21 mil- lion people. The three surveys have been thoughtfully integrated in that they use nearly identical data collection instruments and data processing 13
14 USING THE AMERICAN COMMUNITY SURVEY procedures; they are fielded at the same time and they use the same reference period. They have been designed to provide coverage of the same target population: noninstitutionalized individuals residing in the United States, under 75 years of age, with a bachelorâs or higher degree, and educated or working in science and engineering (S&E) and related fields and occupations. Scientists and engineers are those who hold a bachelorâs or higher degree in an S&E or S&E-related field or who have a bachelorâs or higher degree in a non-S&E field but have an S&E or S&E- related occupation. Special emphasis in the surveys is given to relatively rare populations, such as doctorates, recent graduates, minorities, and people with disabilities. All cases that qualify as scientists and engineers according to the SESTAT target population definition are integrated into a comprehensive database, the SESTAT integrated file, of all college-educated scientists and engineers in the United States. Because a person may be eligible for inclu- sion in more than one of the surveys, the National Science Foundation (NSF) uses a sophisticated method to ensure that each person is counted only once. The integrated file is used to produce national estimates of the number and characteristics of scientists and engineers in the United States. The SESTAT surveys are unique in the federal system in that they compile detailed occupational, educational, and demographic data in one database. The complete educational histories that are collected for each person allow for a detailed examination of the relationship between education and career outcomes. The SESTAT surveys are conducted every 2-3 years and are designed, primarily, to provide cross-sectional time-series data. However, an impor- tant new analytical dimension to the surveys was added when SESTAT individual data were assembled into longitudinal files that were prepared for the period from 1993 to 1999. The history of the SESTAT Program and the interrelationship between the component surveys is shown in Figure 2-1. â For further information on SESTAT, see http://sestat.nsf.gov; for NSCG, see http:// www.nsf.gov/statistics/srvygrads; for NSRCG, see http://www.nsf.gov/statistics/ srvyrecentgrads; and for SDR, see http://www.nsf.gov/statistics/srvydoctoratework [ac- cessed April 2008]. â The statistical integration process uses a unique linkage rule. Each survey is weighted according to the frame developed for that survey and a series of overlap variables are calcu- lated that allow for the identification of cases that are eligible for more than one survey. To remove these multiple selection opportunities, each case in the SESTAT target population is uniquely linked to one and only one component survey, and that individual is included in the SESTAT integrated file only when he or she is selected for that linked survey.
2001 No SESTAT SESTAT SESTAT SESTAT SESTAT Integrated SESTAT SESTAT 1993 1995 1997 1999 File/ 2003 2006 National Estimates RCG (new RCG (new RCG (new RCG (new RCG (new Decennial RCG (new RCG (new S&E S&E S&E S&E S&E 2000 SEH SEH bachelor's bachelor's bachelor's bachelor's bachelor's bachelor's bachelor's and master's and master's and master's and master's and master's and master's and master's 1990-1992) 1993-1994) 1995-1996) 1997-1998) 1999-2000) 2001-2002) 2003-2005) Decennial 1990 NSCG not NSCG NSCG NSCG NSCG NSCG NSCG conducted SED (1962-1992) SDR SDR SDR SDR SDR SDR SDR SED SED SED SED SED SED (1993-1994) (1995-1996) (1997-1998) (1999-2000) (2001-2002) (2003-2005) FIGURE 2-1 SESTAT surveys. NOTE: SESTAT = Scientists and Engineers Statistical Data System, RCG = National Survey of Recent College Graduates, S&E = science and engineering, NSCG = National Survey of College Graduates, SDR = Survey of Doctorate Recipients, SED = Survey of Earned Doctorates. 15 SOURCE: National Science Foundation (2007, p. 6)
16 USING THE AMERICAN COMMUNITY SURVEY SESTAT is complex in that it represents both stocks and flows of sci- entists and engineers: â¢ The NSCG, which provides the majority of cases in the SESTAT integrated database, represents the âstockâ of scientists and engineers at the beginning of the decade. A new panel has been selected at the beginning of each decade for the NSCG. Respon- dents to the NSCG who are identified as eligible respondents are included in the NSCG follow-up surveys for the rest of the decade. â¢ The NSRCG captures the âflowâ of new U.S. graduates with bach- elorâs and masterâs degrees in science, engineering, and health. It is a two-stage, cross-sectional survey: first, a sample of institu- tions; and second, a sample of graduates from those institutions. In addition to providing flow information on new graduations, the NSRCG provides a subsample that is followed in the NSCG (as part of the stock). â¢ The SDR provides data on the stock of experienced workers with U.S. doctorates, as well as the flow of new U.S. doctorates in sci- ence, engineering, and health fields. The target population for the SDR is all people with doctoral degrees in those fields awarded at U.S. institutions. The overall sample size of the SDR is held steady, while for each new round a sample of new doctorates is added to the sample from its frame, the Survey of Earned Doctor- ates (SED). A summary of information about the three components of the SESTAT program is shown in an NSF-produced table, shown here as Table 2-1. All three surveys are collected with a combination of mail and computer- assisted telephone interviewing (CATI) and in some years, the NSCG uses computer-assisted personal interviewing (CAPI) follow-up as well. The program has been developing a web-based collection option for the NSRCG and the SDR in the last two rounds. The response rates shown in Table 2-1 deserve some explanation. The NSCG response rates for 1993 and 2003 are the rates for the initial (full coverage) sample as selected from the census long-form records and do not include âcarryoverâ sample units from the prior decade. There are two response rates shown for the later years of NSCGââconditionalâ response rates pertaining to the sample of respondents from previous cycles (including supplemental cases from the NSRCG) and âuncon- ditionalâ response rates pertaining to the original decennial sample. The response rates shown for the NSRCG and the SDR are âuncondi- tionalâ response rates pertaining to the cross-sectional samples that were selected for the particular years.
THE SCIENTISTS AND ENGINEERS STATISTICAL DATA SYSTEM 17 TABLE 2-1 SESTAT Survey Characteristics, 1993-2006 National Survey of College Graduates (NSCG) 1993 1995 1997 1999 2003 2006 Survey mode m/c/p m/c/p m/c/p m/c m/c/p m/c Sample sizea 214,643 61,897 46,075 35,714 170,800 59,349 Unweighted response rate â Conditional 78% 95% 94% 91% 63% 88% â Unconditional 78% 74% 70% 63% 63% 55% National Survey of Recent College Graduates (NSRCG) 1993 1995 1997 1999 2001 2003 2006 Survey mode c/m c/m c/m c/m c/m m/c/w m/c Sample size 25,785 21,000 14,057 13,918 13,513 18,000b 27,000c Unweighted response rate 86% 86% 82% 79% 80% 66% 68% Survey of Doctorate Recipients (SDR) 1993 1995 1997 1999 2001 2003 2006 Survey mode m/c m/c m/c m/c m/c m/c/w m/c/w Sample size 49,228 49,829 54,103 40,000 40,000 40,000 45,000c Unweighted response rate 87% 77% 85% 82% 82% 79% 79% NOTE: m = mail; c = computer-assisted telephone interviewing (CATI); p = computer- assisted personal interviewing (CAPI); w = web-based. aIncludes only sample originally from the decennial census; does not include sample updates from the NSRCG. bSample size increase because health fields were added to the NSRCG. cSample size increase due to the sampling of three graduating cohorts instead of two. SOURCE: National Science Foundation, Response to Committee Questions, October 11, 2007. MANDATED REQUIREMENTS The legislation that established NSF contained a provision that Con- gress has mandated the agency âto provide a central clearinghouse for the collection, interpretation, and analysis of data on scientific and engineer- ing resources and to provide a source of information for policy formula- tion by other agencies of the Federal Governmentâ (NSF Act of 1950, as amended; 42 U.S.C. 1862). A critical component of this mission is informa- tion on the science and engineering workforce in the United States. NSF is also mandated to produce two biennial reports, Science and Engineering Indicators and Women, Minorities, and Persons with Disabilities
18 USING THE AMERICAN COMMUNITY SURVEY (WMPD). The mandate for Indicators is broad, requiring NSF to report on the status of science and engineering in the United States. The mandate is not specific about what topics should be covered, but the scientific workforce is clearly an important component of the S&E enterprise. The mandate for the WMPD is more specific. The Science and Engineering Equal Opportunities Act of 1980 (Public Law 96-516) mandated NSF to ensure that obtaining information on women, minority group members, and people with disabilities in the S&E workforce was an important con- sideration in data collection and analysis. The two reports require new workforce data every report cycle, which is every 2 years. From nearly the beginning of NSF, there have been efforts to provide comprehensive information about the highly skilled technical workforce, starting initially as a registry of people who should be included and then expanding to surveys. The NSCG is particularly important because it has the most comprehensive coverage of the surveys that contribute to SESTAT. It is the only one that captures an increasingly important and growing segment of the S&E workforce: immigrants who received none of their higher education in the United States. USER NEEDS Workforce data are used in a variety of ways beyond fulfilling the legislative mandates. In recent years, the demand for S&E workforce data has increased as attention has been focused on issues of globaliza- tion, competitiveness, the role of the S&E workforce in national economic growth, the dynamic nature of workforce flows, and federal interventions to improve the health of U.S. science and engineering. The need for an adequate base of knowledge to be able to assess the effects of interven- tions and to better understand the complex system that educates and sus- tains a science and engineering workforce was recognized by a National Science Board (2003) study that recommended that the federal govern- ment lead a national effort to build a base of information on the current status of the S&E workforce. This concern was echoed in a blue ribbon conference sponsored by the Office of Science and Technology Policy (OSTP) and the Sloan Foundation in late 2003. The conference report (Kelly et al., 2004) identified a number of âgrand challengesâ in the S&E workforce area that NSF would face, including the need to improve the estimate of the stock of scientists and engineers past the start of the decade (when the decennial census data are fresh and include a current estimate of immigrant scientists and engi- neers), fix problems with data on rare populations (such as persons with disabilities and foreign students), and integrate the workforce data with information from other NSF surveys on research and development.
THE SCIENTISTS AND ENGINEERS STATISTICAL DATA SYSTEM 19 Two recent federal government initiatives, fostered in large part by the report, Rising Above the Gathering Storm: Energizing and Employing America for a Brighter Economic Future (National Academy of Sciences, National Academy of Engineering, and Institute of Medicine, 2007) are predicated on the need for better data on the S&E workforce. One is the NSFâs Science of Science and Innovation Policy. This is a new funding channel which will underwrite fundamental research that creates new explanatory models and analytic tools designed to inform the nationâs public and private sectors about the processes through which investments in science and engineering research are transformed into social and eco- nomic outcomes. The second initiative is the American Competitiveness Initiative, which funds federal investment in research and development (Office of Science and Technology Policy, 2006). It identifies NSF, the Department of Energyâs Office of Science, the National Institute for Standards and Tech- nology, and the Department of Defense as key agencies, and it emphasizes workforce education and training by seeking to increase access to college and to recruit and retain students in science, technology, engineering, and mathematics majors at the undergraduate and graduate levels. These and other initiatives will depend critically on a well-grounded informa- tion system to assist decision making and to measure progress toward national goals. In reevaluating SESTAT for the 2000 decade, in January 2008, NSF undertook a comprehensive effort to gain input from a wide variety of users. The effort included focus groups, invited papers, and a variety of panel and information meetings to obtain input from federal agencies, academic researchers, policy makers, and other stakeholders who use the SESTAT surveys. NSF also contacted a variety of people who were not users to ask why they were not using SESTAT data for their research or other purposes (personal communication, NSF staff). In response to the needs of users as expressed in these studies and initiatives, NSF has identified some common research questions that the SESTAT surveys are called on to address. â¢ How many U.S. scientists and engineers were born abroad or have a degree from foreign countries? â¢ What are the labor force outcomes by degree field for college graduates? â¢ How do these vary by gender, race, and ethnicity? â See http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=501084 [accessed April 2008]. â Presentation by Nirmala Kannankutty, NSF. Panel workshop. October 5, 2007.
20 USING THE AMERICAN COMMUNITY SURVEY â¢ What are the economic returns associated with additional degrees in S&E and related fields? â¢ What are the salary and occupational differences between those with and without S&E and related degrees? â¢ How have labor market conditions changed over the past decade for people with and without S&E degrees and working in S&E occupations and other occupations? â¢ How does job satisfaction vary by degree field and occupation? The task of providing answers to these important analytical ques- tions is a significant challenge for NSF. It is even more challenging when particular information needs are taken into account, as detailed in the next section. GROUPS OF INTEREST AND KINDS OF DATA Immigrants Science is a global enterprise, and the impact of foreign-born scientists on U.S. competitiveness has been profound. The National Science Board (2006, p. 3-32) estimates that âforeign-born scientists are more than a quarter, and possibly more than a third, of the S&E doctorate degree labor force, and are even more important in many physical science, engineering, and computer fields.â The percentage of foreign-born college graduates (including with either U.S. or foreign degrees) in S&E jobs has been grow- ing: it increased from 11.2 percent of the workforce in 1980 to 19.3 percent in 2000 (National Science Board, 2006, p. 3-19). The NSCG plays a critical role in addressing the current gaps in cover- age of immigrants in SRS surveys. It provides information on non-U.S.- educated immigrants once each decade and is the only NSF survey that is able to do so because all the other surveys use U.S. higher education institutions as a sample frame. International immigration patterns play an important role in understanding the flows of the highly trained scientists and engineers. NSF attempted to obtain information on educated immi- grants from the U.S. Department of Homeland Security but could not do so because individual records are considered confidential. Graduates from Non-S&E Fields The concept of âscience and engineeringâ has evolved in NSF with respect to the SESTAT surveys. In the early history of the surveys, science and engineering was defined on the basis of the fields of science that were supported by NSF, with a focus on individuals with bachelorâs or higher
THE SCIENTISTS AND ENGINEERS STATISTICAL DATA SYSTEM 21 degrees. In advance of the 1990s SESTAT redesign, one of the major rec- ommendations from a National Research Council (1989) study was that NSF should cast a wider net with respect to the fields and occupations that were covered so that analysts could obtain a broader picture of the workforce. NSF decided to implement this recommendation in its redesign of the NSCG, which also involved using the census long form as the sam- pling frame. However, because the census did not include information on respondentsâ field of degree, the NSCG had to include all college degree holders as its target population. The decision provided NSF with another valuable context in which to present the S&E workforce dataâa comparison of college graduates with and without science and engineer- ing degrees. After the 1993 and 2003 NSCG surveys, NSF conducted such comparisons. These comparisons have become a standard part of the Sci- ence and Engineering Indicators workforce chapter. In implementing this change, the definition of the ânon-S&E work- forceâ was reconsidered. Science and engineering fields and occupations have generally been defined by five broad categories: computer and math- ematic sciences; biological sciences and scientists; physical sciences and scientists; social sciences and scientists; and engineering and engineers. Although everyone who reports holding at least one degree in one of these fields should be counted in the NSF science and engineering work- force data, not all are. In the post-censal NSCG, only the cases that were eligible to be followed in subsequent cycles of the NSCG were considered to meet the definition. Another definitional issue involves important elements of non-S&E degrees and occupations. Specifically, there is a set of degrees and occupa- tions that require attainment of scientific or mathematical skills or the use of these skills in a job, such as health occupations and technical support jobs in several fields. Although NSF recognized that there is a connection to science and engineering in the education or jobs of people with such training or jobs, the SESTAT surveys did not include all of them because of operational limits to coverage. Over the past decade or so, the population of those without an S&E degree working in S&E occupations has been expanding, in part as a result of growth in information technology fields. When NSF was evalu- ating coverage issues for the 2000 surveys, there was a conscious effort to expand coverage to include some of the non-S&E degrees and occupa- â One exception is the SDR, which has always included people with health doctorates in the target population. However, in the 1990s SESTAT files, these were considered S&E cases, while people with bachelorâs or masterâs degrees in health fields were considered non-S&E cases.
22 USING THE AMERICAN COMMUNITY SURVEY tions that had a close relationship to science and engineering. The two- tiered taxonomy was converted to a three-tiered taxonomy: S&E fields, S&E-related fields, and non-S&E fields. Within the new taxonomy, the S&E definition and coverage did not change. The previous non-S&E group was split into two componentsâ S&E related and non-S&E related. For the NSCG, there was no change in follow-up plans with regard to the non-S&E cases. The S&E-related group was created to allow for better coverage of the degrees and occupations in this group. The S&E-related group included the specified degrees and occupations: the degrees covered were those in health sciences, science and mathematics teacher education, and technology and technical fields; the occupations covered were health scientists, secondary teachers of sci- ence and mathematics, S&E managers, and technicians and technologists in science and engineering. For the NSCG, follow-up after the postcensal year was expanded to include people with S&E-related degrees or occu- pations. No changes were made to the SDR because of this change, since people with doctorates in health fields had always been included in the SDR. For the NSRCG, sampling was expanded to include people with bachelorsâ and masterâs degrees in health fields. In practice, SESTAT coverage has expanded only for health degrees and occupations to make them as comprehensive as the data for S&E fields. To rectify the problem of partial coverage for other S&E-related fields and occupations, NSF has included a broader set of cases from the NSCG for follow-up. Despite the importance of uses of the non-S&E workforce data, the need for these data is considered less critical for NSF than for the S&E and S&E-related data. For purely NSF uses, it may not be necessary to con- tinue past practices of sampling large numbers of non-S&E cases. In part, this sampling scheme was an artifact of the type of information available on the census long form that was used for sampling. With the addition of a field-of-degree question to the ACS, the NSCG sample could focus more on S&E and S&E-related cases, with less emphasis on non-S&E cases, so there would likely be fewer cases sampled from the last group (except for people with health doctorates, as explained above). To the extent that non-S&E data are still necessary or wanted, it might be possible to include a representative (if smaller) samples of non-S&E cases, or it might be pos- sible to ask questions on the NSCG that mirror questions on the Current Population Survey (CPS) or ACS questions so that information about NSCG S&E cases can be interpreted relative to all college graduates. Associate Degree Holders A significant number of the S&E workforce does not have bachelorâs degrees, particularly among S&E technicians and technologists. There
THE SCIENTISTS AND ENGINEERS STATISTICAL DATA SYSTEM 23 were two motivations for seeking data related to sub-baccalaureate educa- tion that users identified: understanding the role of community colleges for those earning higher degrees in science and engineering and under- standing the population of technologists and technicians who support science and engineering work in the United States. For the first need, NSF has included questions on the SESTAT surveys and the SED to gather information about community college attendance for those earning a bachelorâs degree or higher. For the second need (which has a lower pri- ority for the agency), NSF has investigated other data sources to see if it was possible to meet the interests in data on associateâs degree holders, given the substantial increases in survey operations (and costs) necessary to expand the Division of Science Resources Statistics (SRS), NSF surveys to cover this population. If use of the ACS creates cost efficiencies, it may be possible to reconsider the inclusion of those with associateâs degrees in the SESTAT target population. NSF has determined that some information could be obtained from the CPS. Earlier this year, NSF published a working paper comparing SESTAT and CPS, which covered the types of analysis that could be done with CPS to report on the below-the-baccalaureate population (Tsapogas et al., 2007). The National Center for Education Statistics (NCES) also has a series of surveys that could be used for analysis of the associateâs degree population. The ACS could provide a rich source of data on this popula- tion, though, unfortunately, not by field of degree since that question will only be asked of those with a bachelorâs degree or higher. To assure a continuous flow of advice from users, NSF has created a Human Resources Experts Panel (HREP) comprised of users of its human resources data. This panel will provide SRS advice about relevant data and policy issues related to graduate education and the S&E workforce. The HREP is scheduled to meet at least twice a year; the first meeting was held on October 11, 2007. Longitudinal Data All of the SESTAT surveys have been designed to produce cross- sectional estimates for their individual target populations and for use in the SESTAT integrated database. However, some respondents in all three of the SESTAT surveys are treated as panel cases that are eligible for follow-up in subsequent years. This ability to longitudinally follow persons over time has been related to the use of the decennial census long form as the sample frame. By tying the sample frame to the decen- nial census, a new frame was available only once a decade so a very large sample had to be drawn to identify persons eligible for inclusion in the postcensal survey. There was no advantage to selecting a new sample later in the decade as there was no updated frame and the extensive screening
24 USING THE AMERICAN COMMUNITY SURVEY to identify eligible cases (and the substantial costs involved) would have to be repeated. In addition, the selection of the decennial census as the frame spawned a longitudinal design that, in turn, provided stability to the estimates over time. The design enables analysts to track changes in status, such as career paths over time, but analysts need to have longitudinal weights to gen- erate estimates in order to fully exploit the potential of the longitudinal character of the survey. Until recently, only cross-sectional weights were available so an individual caseâs weight was permitted to fluctuate over time. NSF remedied this in a decision to develop longitudinal weights for the 1990s SESTAT integrated files to enhance the analytic capability of the panel data. Developing longitudinal weights was a complex effort, as there were some elements of the individual survey designs and decisions on which cases were eligible for follow-up that limited the number of cases for which weights could be developed, which affected the weighting meth- odology. After reviewing a variety of options, NSF developed a set of lon- gitudinal weights for the 1993-1999 integrated SESTAT data that worked around these limitations. The longitudinal weights that were developed were originally intended primarily for internal use by NSF. Â In recent years, NSF has been devoting substantially more resources to support data use by exter- nal users.Â For example, a user guide has been developed that explains to users what the limitations are and how to use the longitudinal weights; NSF has also written a short analytic piece that shows examples of how the longitudinal files could be used. The longitudinal data files are expected to be available for release to licensees when this analytic piece has been fully reviewed and released. Because there are users who have expressed an interest in these files, NSF expects them to be used immediately upon release (personal communication, NSF staff). As a result of this increased attention and the devotion of resources to this capability, it is likely that the number of users of longitudinal data will continue to grow and the demand for the data will increase as well. Recent College Graduates As an associated issue, the panel considered the continued need for data on recent college graduates that now comes mainly from the NSRCG. NSF reported to the panel that it is difficult to identify a mandated NSF need for the NSRCG data in and of itself. Although NSF and some outside users do make some analytical use of the data, it is not clear how much â Three sets of longitudinal weights were developed: 1993-1995, 1993-1997, and 1993-1999.
THE SCIENTISTS AND ENGINEERS STATISTICAL DATA SYSTEM 25 the data elements collected in the NSRCG benefit the analytical commu- nity. However, there is some indication that NSRCG data are useful for employers and government to understand and predict trends in gradu- ate school enrollment, employment opportunities, and salaries for recent graduates in S&E fields. Part of the reason that there are so few uses of the survey data has to do with limitations in the design of the NSRCG. It is essentially a repeated cross-sectional survey so the NSRCG has limited utility for longitudinal analysis. NSF is not able to follow the respondents over time because of the loss of cases from sampling down of NSRCG cases in subsequent survey rounds and the practice of dropping of cases when the individual earns another eligible degree after the degree for which they were sam- pled for the NSRCG. To the extent that data on this population are needed, there appear to be other options. For example, the NCES has a longitudinal survey of recent graduates, Baccalaureate and Beyond (B&B), which follows a cohort of masterâs and bachelorâs degree recipients for a few years. A new B&B cohort is started about once a decade. B&B surveys recent graduates in all fields, with a particular focus on studying those who enter and remain in teaching at the K-12 level. The amount of analysis that is Âpossible with B&B data for detailed S&E fields is currently limited by small sample sizes. â The 2000 cohort for the B&B survey numbered only about 10,000 sample cases; for de- tails, see http://nces.ed.gov/programs/quarterly/Vol_5/5_3/5_2.asp#5 [accessed February 2008].