Principal Sources of Human Resource Data
The workshop identified four principal sources of data on professionals involved in the innovation process. First, the National Science Foundation, U.S. Bureau of the Census, and Bureau of Labor Statistics conduct national surveys yielding personal information on scientists and engineers or members of the workforce generally. Second, there are databases—for example, patent files and indexes to scientific publications—with relevant personal information that can be searched by individual. Third, some institutions track the employment histories and work product of individuals affiliated with them—for example, university graduates, professional association members, and government research agency grantees, trainees, and research associates. Finally, of course, scholars construct their own data sets for particular research purposes. The workshop was not intended to produce an inventory of these sources, but participants described the principal national surveys and cited examples of other sources.
NATIONAL SCIENCE FOUNDATION SURVEYS
Mary Golladay of the Science Resources Studies Division described the four NSF surveys and an integrated database that currently provide information on scientists and engineers educated or working in the United States.
Survey of Earned Doctorates (SED)
The SED is an annual census of all individuals earning a doctoral de-
gree in the United States. The survey was begun in 1958 by NSF and cosponsored by the U.S. Department of Education, National Institutes of Health, National Endowment for the Humanities, and the U.S. Department of Agriculture. It includes information on demographic characteristics such as date of birth, marital status, education of parents, and geographic location of high school attended (Sanderson and Dugoni, 1999). There are also questions about field of training, sources of financial support during graduate education, and postgraduate employment plans. NSF issues an annual report which provides the same information just on the science and engineering doctorates (Hill, 1999).
Survey of Doctoral Recipients (SDR)
The SDR is a longitudinal demographic survey of science and engineering doctorate holders conducted biennially for the NSF and other federal agencies since 1973. In this survey, a sample of holders of doctorates in science and engineering earned at U.S. institutions is followed throughout their careers from year of degree award until age 76. Every 2 years, a sample of new S&E doctoral degree earners is added to the SDR from the SED. In 1999, for example, the sample frame included U.S.-earned S&E doctorates through the 1998 academic year. Detailed statistical tables in this report provide information on the number of scientists and engineers by demographic characteristic such as citizenship, place of birth, field of degree, and employment-related characteristic such as occupation, sector of employment, median salary, and various labor force statistics (e.g., unemployment rate).
Stephan, in her background paper, used data from the SDR to illustrate three trends in the deployment of skilled human resources that reflect changes in the structure of innovation. First, in all S&E fields there is a marked increase in the share of PhDs working in industry. Second, with the exception of chemistry, the share of PhDs employed in manufacturing industries has declined over time. Third, an increasing number of PhDs in industry are not engaged in R&D or R&D management.
National Survey of College Graduates (NSCG)
The NSCG was first administered in 19931 and biennially thereafter to a nationally representative sample of all college degree holders who were identified through the 1990 decennial census. The target population
for this survey includes individuals in the United States as of April 1990 with a bachelor’s degree or higher in any field, not just in science or engineering. In addition to including people with degrees earned at U.S. institutions, the NSCG also includes college degree holders who earned their degrees outside of the United States and were living in the United States in 1990. In 1993, those with science or engineering degrees and those without such degrees but working in S&E occupations were selected from the NSCG. These two populations are collectively referred to as the “S&E panel” of the NSCG. These same two groups were followed in the 1995, 1997, and 1999 rounds of the survey.
National Survey of Recent College Graduates (NSRCG)
The NSRCG has been administered biennially since 1974 to recent S&E bachelor’s and master’s degree recipients. The 1997 survey, for example, included those who earned bachelor’s and master’s degrees in science and engineering in the 1995 and 1996 academic years. Topics include educational experience before and after obtaining the sampled degree; graduate employment characteristics including occupation, salary, unemployment, underemployment, and post-degree work-related training; relationship between education and employment; and graduate background and demographic characteristics. The data may be used to understand the employment experiences of recent graduates such as the extent to which recent graduates entered the labor force, whether they were able to find employment, and the attributes of that employment. Results of this survey are presented separately for bachelor’s and master’s degree recipients and also separately for graduates of the two graduating class years.
Scientists and Engineers Statistical Data (SESTAT)
Since 1993 the SDR, NSRCG, and S&E panel of the NSCG have been integrated into SESTAT, the most comprehensive and easily accessed (http://srsstats.sbe.nsf.gov/) source of information about the employment, education, and demographic characteristics of scientists and engineers in the United States (Kannankutty et al., 1999). The SESTAT target population includes residents of the United States with at least a bachelor’s degree and who, as of the reference date of the survey (i.e., April 15, 1993, April 15, 1995, April 15, 1997, etc.) were trained or working as a scientist or engineer, were less than 75 years old, and were not institutionalized. Not included in the sampling frames are individuals with associate’s degrees in S&E fields or who are working in S&E occupations but lack bachelor’s degrees. After 1993, the SESTAT surveys include only
individuals whose degrees are from U.S. institutions and thus exclude immigrants with degrees from non-U.S. institutions who entered the United States after 1990. Some individuals have multiple chances of selection because they may have been included in the sampling frames for more than one component survey.
The principal variables in the SESTAT database are listed in Box 2-1. Access to some data is restricted to protect respondents’ confidentiality.
BUREAU OF LABOR STATISTICS AND BUREAU OF THE CENSUS SURVEYS
Michael McElroy and James Spletzer, representing the Bureau of Labor Statistics, explained that the agency collects occupational employment statistics through three surveys, principally the Occupational Employment Statistics Survey, supplemented by the Current Population Survey (conducted jointly with the Census Bureau) and the Current Employment Survey, which are combined to create the National-Industry Occupation Employment Matrix.
Occupational Employment Statistics Survey (OES)
The OES program conducts a yearly mail survey of nonfarm establishments in order to produce employment and wage estimates for over 700 occupations. Data on self-employed persons are not collected and are not included in the estimates. The OES program produces these occupational estimates by geographic area and by industry. Estimates based on geographic areas are available at the national, state, and metropolitan area levels. The Bureau of Labor Statistics produces occupational employment and wage estimates for over 400 industry classifications at the national level. The industry classifications correspond to the two- and three-digit Standard Industrial Classification (SIC) industrial groups.
The OES program surveys approximately 400,000 establishments per year, taking three years to fully collect the sample of 1.2 million establishments. To reduce respondent burden, the collection is on a 3-year survey cycle that ensures that establishments employing fewer than 250 workers are surveyed at most once every 3 years. The estimates for occupations in nonfarm establishments are based on OES data collected for the reference months of October, November, or December.
The 1996 survey round was the first year in which the OES program began to collect wage rate data along with the occupational employment data in every state. In addition, the program’s 3-year survey cycle was modified to collect data from all covered industries each year. Prior to
For the employed:
Primary job and salary
If previously retired
Type of employer: educational institution (by type); private for-profit; private not-for-profit; government (state/local or federal); self-employed
Supervisory responsibility, including number typically supervised directly and through subordinates
Relationship between work and highest degree, including reasons for employment outside the highest degree field
Typical work activities (in 14 categories), including primary and secondary work activities
Licensing and certification if required, recommended, or held
U.S. government support for research, including supporting agencies or departments
Second job, including occupation, salary, and relationship between work and highest degree field
For the unemployed and those not in the labor force:
Reasons for not working during the reference week
When last worked
Job last worked
Other Work-Related Information
Membership in professional societies and associations, including meeting attendance
Participation in work-related training activities, including types of training and reasons for participation
First bachelor’s and two most recent degrees—level, degree field (major and minor), when awarded
Earlier education—date awarded high school diploma; associate degree(s)
Continuing education—post-degree college courses, reasons and field of study; employer financing
Spouse’s employment status; if working full/part-time, technical expertise required on job
Children living at home (and ages)
Parents’ educational attainment
Citizenship status (by type)
Country of birth
1993: Labor force status in 1988:
Type of employer and job
If different from current job, reasons for changing employer or job
1995 (SDR only): Post-doctoral experience:
Whether ever held a post-doctoral position
Number of post-docs held over career
Type of employer, including types of benefits offered
Whether current job was a post-doctoral position
1995 (NSCG and SDR only): Patent and publication activity:
Number of articles or other publications authored by respondent
Number of patent applications, patents awarded and commercializations attributed to respondent
1997: Alternative or temporary work experience:
Whether relationship to employer was alternative or temporary (consulting, contracting, etc.)
Reasons for such work arrangements
Whether benefits were provided, and if so, types of benefits
1996 the OES program collected only occupational employment data for selected industries in each year of the 3-year survey cycle.
Information contained in the survey is shown in Box 2-2.
Current Population and Employment Suveys (CPS and CES)
The CPS, a monthly survey of a probability sample of 50,000 households conducted by the Bureau of the Census for the Bureau of Labor Statistics, provides information on the employment and unemployment
National Occupational Employment and Wage Estimates
Total employment by occupation
Wages by occupation
Occupational employment distribution by wage range
State Occupational Employment and Wage Estimates
Total employment by occupation
Wages by occupation
Metropolitan Area Occupational Employment and Wage Estimates
Total employment by occupation
Wages by occupation
experience of persons living in the United States. It is the primary source of information on the labor force characteristics of the U.S. population. Estimates from the CPS include employment, unemployment, earnings, hours of work, and other indicators. They are available by a variety of demographic characteristics including age, sex, race, marital status, educational attainment, occupation, and industry. CPS data are considered important indicators of the nation’s economic situation and are used for planning and evaluating many government programs.
The CES, also a monthly survey, provides employment, hours, and earnings estimates based on payroll records of business establishments. The CES survey does not collect occupational information. Together, the CPS and CES fill in some of the gaps in coverage by the OES, such as self-, household, and farm employment.
National Industry-Occupation Employment Matrix (NIOEM)
The OES, CPS, and CES are combined to produce the National Industry-Occupation Employment Matrix as part of BLS’s ongoing Occupational Employment Projections Program. The matrix shows occupational staffing patterns (occupation as a percent of the workforce) in 260 detailed industries and 513 detailed occupations. NIOEM includes establishments in all sectors of the economy and all members of the science and technology labor force of educational attainment, including those below the bachelor’s level, in all academic disciplines (http://www.bls.gov/asp/oep/nioem/empiohm.asp).
It does not, however, have demographic or educational attainment information on individuals. Nor does it include those who have science and technology training but are in non-S&T jobs. For example, people with technical backgrounds who are top-level managers in industry or government are not captured in the “engineering, science, and computer systems managers” category, and other S&E-trained individuals are teachers, service personnel, writers, lawyers, etc. NIOEM categories do include groups not included in the SESTAT database—technicians and technologists and people in technical occupations where a bachelor’s degree is not customarily required.
SESTAT and NIOEM each provide some information on the science and technology labor force and contribute to understanding the human resources required for science and technology in the United States (Kannankutty, 1999). NIOEM data give a broad view of the demand side of the technical labor market—jobs available as reported by establishments. SESTAT data give a more detailed picture of the supply of scientists and engineers with a bachelor’s degree and above who are employed in the labor force. SESTAT shows that many people with S&E training have moved into the non-science and engineering labor force.
The two surveys are nevertheless not perfectly complementary. Although NIOEM includes employment data on technologists and technicians, complementary SESTAT data cannot be found for a large number of persons holding these jobs because they do not hold bachelor’s degrees. The converse holds with regard to managers of the scientific and engineering enterprise: SESTAT can be used to identify scientists and engineers who are managers, but these people cannot be mapped into one specific category in the NIOEM (Kannankutty, 1999).
LINKING MICRODATA SETS WITH CONFIDENTIAL INFORMATION
J. Bradford Jensen, Director of the Center for Economic Studies (CES) of the Census Bureau, spoke about secure access sites for using Census data, the kinds of data available through the sites, and a project illustrating some of the research opportunities and constraints imposed by data confidentiality requirements. CES conducts empirical research on confidential microdata from the Census Bureau’s regular survey and census programs.
Microdata are data at the level of the individual respondent, which might be a household, an individual within a household, an establishment, or a firm. Microdata can be used to create longitudinal or panel data that track respondents over time, which is impossible with aggregate data. Microdata also provide information about location and distance,
which allows an assessment of spillover effects. Finally, microdata can be used to match individual data from other databases, for example, permitting the linkage of demographic (household and individual) data and economic (business establishment and firm) data.
Under Title XIII of the U.S. Code, the Bureau of the Census must keep the identity of respondents and the information they provide confidential. Data from demographic surveys are available in public-use files. It has proved impossible to create similar public-use files of data from economic surveys without either violating confidentiality requirements or editing the data to the point of rendering them unusable.
To enable researchers from outside the Census Bureau to use confidential microdata, CES has created a network of regional data centers, some with support from the National Science Foundation. The data centers, which are located in the Census Bureau’s regional office in Boston, and at Carnegie-Mellon University in Pittsburgh, the University of California at Los Angeles, and the University of California-Berkeley, and Duke University, provide a secure site for researchers who obtain “ Special Sworn Status” from the Census Bureau. Outside researchers may also have access at the CES in Suitland, Maryland.
To date, most Research Associates (as outside data users are called) have used the Longitudinal Research Database (LRD), which has longitudinal plant-level data from 1963 to the present, sometimes linked with related databases (e.g., Survey of Manufacturing Technology, Pollution Abatement Costs and Expenditures Database, Manufacturing Energy Consumption Survey, Industrial R&D). CES is broadening the LRD into a Longitudinal Business Database that includes economy-wide data, including the service sector, wholesale and retail trade, and finance, insurance, and real estate.
In the mid-1990s, CES created the Worker-Establishment Characteristic Database (WECD), in which worker characteristics and firm characteristics can be looked at simultaneously. WECD combines demographic information on workers from the long form of the 1990 Decennial Census with information on manufacturing plants where the workers were employed. A New Worker-Establishment Characteristic Database is being assembled that includes all industries.
Researchers also may now access confidential demographic microdata, avoiding the restricted geography and “topcoding” of income and other continuous variables of data in the public-use files. This could permit linking with the National Survey of College Graduates conducted by the Census Bureau for the NSF, as well as other surveys (e.g., Current Population Survey, Survey of Income and Program Participation, American Housing Survey, etc.).
Finally, it should be possible to link with outside data, because the
identity of individual respondents is known. The Census databases include consistent data over a long period of time but cannot collect every question. Combining them with data in commercial databases or collected by researchers would increase the power of both data sets. For example, CES has teamed up with researchers at Carnegie Mellon to look at the impact of managed care on innovation in health care, which will link bibliometric and patent information with economic data on firms and hospitals. CES is also exploring with the American Medical Association (AMA) the possibility of linking AMA data on education and specialization of physicians with economic census information to study doctors’ offices.
Jensen discussed one limitation on the Census economic data. Census does not survey very small establishments with less than 20 employees; it relies on administrative data from the Internal Revenue Service to capture some employment and payroll information on them. This reflects the primary focus of the Census Bureau on developing an accurate picture of aggregate economic activity as an input to the national product accounts of the Bureau of Economic Analysis. Very small establishments account for little economic activity. Nevertheless, this makes it harder to study business start-ups and to understand under what circumstances start-ups become large enough to join Census’ sample frame. There are also problems in tracking mergers and acquisitions among small firms, because the amount of financial assets that changes hands may be trivial compared with the exchange of human capital, which is not measured. This is an issue that might be addressed by using human resources data to track the movement of innovative activity.
Julia Lane, Professor of Economics at American University, described the Longitudinal Employer Household Dynamics Project at the Census Bureau, a collaboration with John Abowd, Cornell University, and John Haltiwanger, University of Maryland. They are using administrative data from the Social Security Administration as the link record between information about individual persons, including earnings and employment histories, and economic data collected about their employers. With respect to scientists and engineers or highly educated individuals generally, because there are repeated observations on individuals, a relatively small initial sample frame becomes a much larger one. One could examine both cohort and temporal effects and career mobility.