Sources of Data
National Research Council
Survey of Earned Doctorates
The Survey of Earned Doctorates (SED) is conducted annually by the National Research Council and is a census of the research doctorates awarded at US universities during the academic year, from July 1 of one year to June 30 of the following year. The self-report response rate from the PhD recipients is about 95%, and information on the remaining 5% of the doctorates is obtained from commencement programs and institutional sources. The survey gathers information on all fields that award research and applied-research doctorates, except professional degrees such as the MD, DDS, OD, DVM, and JD. It gathers data on a field-specific basis, and includes information on ethnic background, sex, postsecondary education, time to PhD degree from the baccalaureate degree, financial support during graduate studies, and postdoctoral plans. The data from the survey become part of the Doctorate Records File (DRF), a virtually complete database on doctorate recipients from 1920 to the present. The data in this file can be manipulated in different ways to obtain the characteristics of graduates by nearly 20 broad fields or several hundred fine fields with regard to their institution, their graduate program, and their plans. The data in the DRF are kept on an individual basis and are linked to other files, such as the file for the Survey of Doctorate Recipients (see below) and the National Institutes of Health grants files.
In the life-science fields included in this report, 7,696 doctorates were added to the DRF in 1996. The field specialties in the life sciences include the agricultural and biomedical sciences and a portion of the health sciences as broad fields, and these are divided into 67 fine-field specialties.
The information in the DRF is complete and reliable for most data points. However, in the case of the data on sources of support during graduate school, students are not always aware of their sources or the type of support, and for postgraduate plans, the survey questionnaire might be complete at a time before a definite commitment or reflect a hope of a particular type of postdoctoral position.
Survey of Doctorate Recipients
The Survey of Doctorate Recipients (SDR) is a biennial longitudinal survey, dating to 1973, of research doctorate-holders working in the United States. The sample for each survey period is adjusted by the addition of persons from the most recent 2-year cohort in the DRF and the dropping of persons who have retired or have reached the age limit of the survey. Before 1991, the population of the survey included a broader range of people, such as holders of US-earned doctorates in humanities, education, and professional fields who were working in science and engineering (S&E), holders of foreign-earned doctorates who were working in S&E in the United States, and a 42-year period of PhD cohorts. The SDR was restructured in 1991 to include only persons under the age of 76 years who hold doctorates in S&E from US universities, and the sample was reduced by 55% to provide resources to increase the response rate.
The survey questionnaire is sent in the spring to each person in the sample. In 1995, the sample numbered 49,829. The people in the sample are
asked a series of demographic and employment questions. The response rate for the survey in 1995 was about 85% after second-wave mailings and telephone interviews; this was about a 30% increase in the response rate over 1989. Although the reduction of the sample reduced the overall number of responses from 1989 to 1995, it is believed that the increased response rate improves the quality of the data. However, the change in the survey produced a potential disjunction between data collected before 1991 and those collected since.
The sample is stratified across three variables: field of degree, sex, and a combination variable that includes degree field, sex, handicap status, ethnic group, and nationality of birth. The results of the survey are statistically analyzed to translate the data into weighted numbers for the entire population. From the weighted results, the doctorate workforce in S&E can be analyzed across different dimensions by looking at different demographic and employment characteristics and by taking different cohorts. That provides for both longitudinal and time-series analyses. However, in the analysis, one must take into consideration the change in sampling frame, the increased response rate in 1991, and the fact that some cells in an analysis could contain very few actual responses, in that the sample is only about 8% of the S&E workforce.
Data available from the SDR up to 1991 are field of doctorate and employment, sector of employment, geographic location, primary work activity, federal support, tenure status, salary data, and ethnic data. However, the 1991 SDR was administered in the fall, not the spring; some data points are not directly comparable with those from other survey years. The 1993 questionnaire incorporated substantial changes from earlier ones. In particular, the questionnaire before 1993 asked for data only as of a specific time, but the 1993 questionnaire asked for some retrospective employment information. There was also a change in the field employment questions, with much broader definitions of job categories, such as "biological scientist", as opposed to, for example, "ecologist" in the earlier surveys. As a result, the number of people in postdoctoral positions might have been slightly overestimated. In 1995, additional questions concerning detailed retrospective descriptions of the time spent in postdoctoral training were added.
The SDR is a sample survey of about 8% of PhD awards, and the number of responses might be low in some cases. A weighting formula is used to adjust the sample to the complete population. For example, a weighted response of 39 unemployed life scientists from the 26 high-quality institutions in 1995 corresponds to five responses; the 20 people working outside S&E in the same population is based on three responses. In the experience of the National Research Council's Office of Scientific and Engineering Personnel who have worked with these data for many years, a response of 10 or more provides a good estimate for a category. Although the sample is small and the analyses must be used with care, the sampling and weighting methods have been carefully developed to provide the most statistically valid results possible.
National Science Foundation Survey of Graduate Students and Postdoctorates in Science and Engineering
The National Science Foundation (NSF) conducts various surveys and data-collecting procedures as part of its responsibility in monitoring the state of science and engineering development in the United States. The survey that pertains most closely to graduate and postdoctoral training is the annual Survey of Graduate Students and Postdoctorates in Science and Engineering.
This survey is designed to provide a comprehensive picture of training of future scientists and engineers in US graduate schools and is used to assess future supply and demand. Graduate students counted in the survey are enrolled for credit in science and engineering master's-degree and PhD programs in the fall term of the survey year, and MD, DO, DVM, and DDS candidates are reported only if they will also receive a PhD. The survey also includes information on postdoctoral appointees and other nonfaculty researchers in academic departments and programs.
The survey is distributed to departments through an institutional coordinator and information is provided on students that are associated with departments. Nearly 10,400 graduate departments at 730 institutions are surveyed. Students in interdisciplinary or interinstitutional programs are reported only by their primary department. Therefore, information about individual programs could be distributed across departments, and data would be aggregated for departments with multiple degree programs.
The following types of information are requested:
- Number of full-time graduate students separated by type of financial support, source of support, and sex, and number of first-year students (no distinction is made between MS and PhD students.
- Number of part-time students and their sex.
- Ethnicity of full-time and part-time students who are US citizens.
- Number of full-time and part-time foreign students.
- Number of postdoctoral and nonfaculty research positions in the department, with type of support for the positions, whether US citizen or foreign, and the sex of the person in each position.
The NSF requests that the survey form be returned by January 31 for data on the previous fall enrollments. The data are reported in a series of reports, many of which are available online through the Internet, on the different aspects of education by institution and field within the institution. However, data tapes will provide more detailed information on separate departments.
Data in table E.3, and figures 2.3 and 2.6 are taken from this NSF survey and are not directly comparable with other data, from the SED and SDR, used throughout the report. The NSF survey counts only persons at academic institutional whereas the SDR counts PhDs in all work environments. Furthermore, NSF definitions of fields differ somewhat from those used in this report (Appendix D). Those differences are not important when addressing questions about graduate students, because students are at academic institutions where NSF performs its survey. However, large differences in the count of postdoctoral fellows can exist between the NSF survey and the SDR. We have used the NSF count of postdoctoral fellows at academic institutions as a starting point because NSF counts both US citizens and foreign nationals, whereas the SDR excludes foreign nationals who have not received their PhD in this country. We have then estimated the number of postdoctoral fellows who might be in government, industry, and other nonacademic laboratories to obtain an estimate of the overall number of postdoctoral fellows in the United States.
The quality of the survey data depends on the knowledge of the persons at the department level who complete the survey.
- Population. In 1995, the NSF survey universe consisted of 722 responding units at 602 institutions. This is a complete survey universe and has been such since the fall of 1988. From 1984 to 1987, master's-degree-granting institutions were surveyed on a sample basis. During the fall 1988 survey cycle, the criteria for including departments in the survey universe were tightened, and all departments surveyed were reviewed. Departments not primarily oriented toward granting research degrees were no longer considered to meet the definition of S&E. As a result of the review, it was determined that a number of departments, primarily in the field of "Social Sciences, not elsewhere classified", were engaged in training primarily teachers, practitioners, administrators, or managers rather than researchers; these departments were deleted from the database. That process was continued during the fall 1989–1995 survey cycles and expanded to ensure trend consistency for the entire 1975–1995 period. As a result, total enrollments and social-science enrollments for all years were reduced. Any time-series problem between 1987 and 1988 should be small. The definition of "medical schools" was revised during the fall 1992 survey cycle to include only institutional components that are members of the Association of American Medical Colleges. That could effect data generated after the fall 1992 survey in that the association excludes schools of nursing, public health, dentistry, veterinary medicine, and other health-related disciplines; this change is not considered to have a major effect on the data.
- Response Rate. In 1995, 712 of 722 reporting units or 98.6%, were able to provide at least partial data. Of the 11,598 departments surveyed, 11,244 or 96.9%, responded. That is, 354 departments, or 3.1%, required complete imputation. Item nonresponse for the responding departments was 1,730, or 15.4 percent; these had one or more data cells imputed. Imputation for missing data elements was based on the prior year's data where available; otherwise, it was imputed on data on peer institutions.
Association of American Medical Colleges Medical Faculty Roster System
The Association of American Medical Colleges (AAMC) maintains several data bases that contain information on US medical personnel. One particularly relevant personnel system is AAMC's Medical Faculty Roster.
The Medical Faculty Roster is a comprehensive data directory of medical-school faculty, including education and employment history, nature of current activities, degrees, rank, and ethnicity. The data for this system are collected continuously from medical schools, as changes occur, through questionnaires that are completed by the faculty members. The accuracy of the data is considered to be very high, as was demonstrated by pilot samples for different studies conducted by AAMC. Data from this system can be linked to other data sources through Social Security numbers.