population of 594,300. Across strata, however, the rates ranged from 4 to 67 percent. The range in sampling rates serves to increase the variance of the survey estimates.

Data Collection

In 1995, there were two phases of data collection: a mail survey and telephone follow-up interview for nonrespondents to the mail. Phase 1 consisted of two mailings of the survey questionnaire with a reminder postcard between the mailings. The first mailing was in May 1995 and the second (using Priority Mail) in July 1995. To encourage participation, all survey materials were personalized with the respondent’s name and address. The mail survey achieved a response rate of about 62 percent.

Phase 2 consisted of conducting computer-assisted telephone interviewing (CATI) on a 60-percent sample of nonrespondents to the mail survey (the CATI subsample). Telephone numbers were located for about 90 percent of the subsample and interviews were completed with 63 percent. Telephone interviewing was conducted between November 1995 and February 1996.

Data Preparation

As completed mail questionnaires were received, they were logged into a receipt control system that kept track of the status of all cases. Coding staff then carried out a variety of checks and prepared the questionnaires for data entry. Specifically, they resolved incomplete or contradictory answers, reviewed “other specify” responses for possible backcoding to a listed response, and assigned numeric codes to open-ended questions (e.g., employer name). A coding supervisor validated the coders’ work.

Once cases were coded, they were sent to data entry. The data entry program contained a full complement of range and consistency checks for entry errors and inconsistent answers. The range and consistency checks were also applied to the CATI data via batch processing. Further computer checks were performed to test for inconsistent values; these were corrected and the process repeated until no inconsistencies remained.

At this point, the survey data file was ready for imputation of missing data. As a first step, basic frequency distributions were produced to show nonresponse rates to each question—these were generally less than 3 percent, with the exception of salary, which was 6 percent. Two methods for imputation were adopted. The first, cold decking, was used mainly for demographic variables that are static, i.e., not subject to change. Using this method, historical data provided by respondents in previous years were used to fill a missing response. In cases where no historical data were available, and for nondemographic variables (such as employment status, primary work activity, and salary), hot decking was used. Hot decking involved creating cells of cases with common characteristics (through the cross-classification of auxiliary variables) and then selecting a donor at random for the case with the missing value. As a general rule, no data value was imputed from a donor in one cell to a recipient in another cell.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 86
Doctoral Scientists and Engineers in the United States population of 594,300. Across strata, however, the rates ranged from 4 to 67 percent. The range in sampling rates serves to increase the variance of the survey estimates. Data Collection In 1995, there were two phases of data collection: a mail survey and telephone follow-up interview for nonrespondents to the mail. Phase 1 consisted of two mailings of the survey questionnaire with a reminder postcard between the mailings. The first mailing was in May 1995 and the second (using Priority Mail) in July 1995. To encourage participation, all survey materials were personalized with the respondent’s name and address. The mail survey achieved a response rate of about 62 percent. Phase 2 consisted of conducting computer-assisted telephone interviewing (CATI) on a 60-percent sample of nonrespondents to the mail survey (the CATI subsample). Telephone numbers were located for about 90 percent of the subsample and interviews were completed with 63 percent. Telephone interviewing was conducted between November 1995 and February 1996. Data Preparation As completed mail questionnaires were received, they were logged into a receipt control system that kept track of the status of all cases. Coding staff then carried out a variety of checks and prepared the questionnaires for data entry. Specifically, they resolved incomplete or contradictory answers, reviewed “other specify” responses for possible backcoding to a listed response, and assigned numeric codes to open-ended questions (e.g., employer name). A coding supervisor validated the coders’ work. Once cases were coded, they were sent to data entry. The data entry program contained a full complement of range and consistency checks for entry errors and inconsistent answers. The range and consistency checks were also applied to the CATI data via batch processing. Further computer checks were performed to test for inconsistent values; these were corrected and the process repeated until no inconsistencies remained. At this point, the survey data file was ready for imputation of missing data. As a first step, basic frequency distributions were produced to show nonresponse rates to each question—these were generally less than 3 percent, with the exception of salary, which was 6 percent. Two methods for imputation were adopted. The first, cold decking, was used mainly for demographic variables that are static, i.e., not subject to change. Using this method, historical data provided by respondents in previous years were used to fill a missing response. In cases where no historical data were available, and for nondemographic variables (such as employment status, primary work activity, and salary), hot decking was used. Hot decking involved creating cells of cases with common characteristics (through the cross-classification of auxiliary variables) and then selecting a donor at random for the case with the missing value. As a general rule, no data value was imputed from a donor in one cell to a recipient in another cell.