The sampling frame for the Survey of Doctorate Recipients (SDR) (including the Survey of Humanities Doctorates) is compiled from the Doctorate Records File (DRF), an ongoing census of all research doctorates earned in the United States since 1920. For the 1995 survey the sampling frame comprised individuals who
To develop the frame, graduates who had earned their degrees since the 1993 survey and met the conditions listed above were added to the frame; those who were carried over from 1993 but had attained the age of 76 (or died) were deleted. A sample of the incoming graduates was drawn and added to the panel sample that is conveyed from year to year. The total sample size was 8,829.
The basic sample design for the 1995 SDR was a stratified random sample with the goal of proportional sampling across strata. The variables used for stratification were field of degree (11 groups), gender (two groups), and year of degree (two groups, distinguishing recent graduates from all others). This resulted in 44 sampling cells.
In determining sampling rates the goal was to achieve as much homogeneity as possible while allowing for oversampling of certain small populations (e.g., minority women). In practice, however, the goal of proportional sampling was not consistently achieved. A number of sample size adjustments over the years, in combination with changes to the stratification, led to highly variable sampling rates, sometimes within the same sampling cell. The overall sampling rate was about 7.7 percent, applied to a population of 115,043. Across strata, however, the rates ranged from 5.3 to 26.5 percent. The range in sampling rates serves to increase the variance of the survey estimates.
Data collection was conducted through a selfadministered mail survey. This consisted of two mailings of the survey questionnaire with a reminder postcard between the mailings. The first mailing was in May 1995 and the second (using Priority Mail) in July 1995. To encourage participation, all survey materials were personalized with the respondent's name and address. The mail survey achieved a response rate of about 65 percent. Because of budget constraints,
Below are the first 10 and last 10 pages of uncorrected machineread text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapterrepresentative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 61
Appendix A
1995 Survey Methodology
Sample Design
The sampling frame for the Survey of Doctorate Recipients (SDR) (including the Survey of Humanities Doctorates) is compiled from the Doctorate Records File (DRF), an ongoing census of all research doctorates earned in the United States since 1920. For the 1995 survey the sampling frame comprised individuals who
had earned a doctoral degree from a U.S. college or university in a humanities field;
were U.S. citizens or, if nonU.S. citizens, indicated they had plans to remain in the United States after degree award; and
were under 76 years of age.
To develop the frame, graduates who had earned their degrees since the 1993 survey and met the conditions listed above were added to the frame; those who were carried over from 1993 but had attained the age of 76 (or died) were deleted. A sample of the incoming graduates was drawn and added to the panel sample that is conveyed from year to year. The total sample size was 8,829.
The basic sample design for the 1995 SDR was a stratified random sample with the goal of proportional sampling across strata. The variables used for stratification were field of degree (11 groups), gender (two groups), and year of degree (two groups, distinguishing recent graduates from all others). This resulted in 44 sampling cells.
In determining sampling rates the goal was to achieve as much homogeneity as possible while allowing for oversampling of certain small populations (e.g., minority women). In practice, however, the goal of proportional sampling was not consistently achieved. A number of sample size adjustments over the years, in combination with changes to the stratification, led to highly variable sampling rates, sometimes within the same sampling cell. The overall sampling rate was about 7.7 percent, applied to a population of 115,043. Across strata, however, the rates ranged from 5.3 to 26.5 percent. The range in sampling rates serves to increase the variance of the survey estimates.
Data Collection
Data collection was conducted through a selfadministered mail survey. This consisted of two mailings of the survey questionnaire with a reminder postcard between the mailings. The first mailing was in May 1995 and the second (using Priority Mail) in July 1995. To encourage participation, all survey materials were personalized with the respondent's name and address. The mail survey achieved a response rate of about 65 percent. Because of budget constraints,
OCR for page 61
the 1995 survey. As a result, the response rate for the 1995 survey was lower than the rates for the two previous surveys.
Data Preparation
As completed mail questionnaires were received, they were logged into a receipt control system that kept track of the status of all cases. Coding staff then carded out a variety of checks and prepared the questionnaires for data entry. Specifically, they resolved incomplete or contradictory answers, reviewed "other, specify" responses for possible backcoding to a listed response, and assigned numeric codes to openended questions (e.g., employer name). A coding supervisor validated the coders' work.
Once cases were coded, they were sent to data entry. The data entry program ensured that only values within allowable ranges were entered and that builtin consistency checks were not violated. For example, a case in which a respondent reported unemployment but later gave a salary was flagged for review.
Finally, to correct for item nonresponse, data not reported by the respondent were imputed. Two imputation methods were used: "cold decking," which used historical data provided by the sample member in past surveys to fill in the missing response, and "hot decking," which used a donor with similar characteristics to provide a proxy response for the missing value.
Weighting and Estimation
The general purpose of weighting survey data is to compensate for unequal probabilities of selection to the sample and to adjust for the effects of nonresponse (see the next section for a discussion of nonresponse). Weights are often calculated in two stages. In the first stage, unadjusted weights are calculated as the inverse of the probability of selection, taking into account all stages of the sampling selection process. In the second stage, these weights are adjusted to compensate for nonresponse; such nonresponse adjustments are typically carried out separately within multiple weighting cells.
The first step in constructing an unadjusted weight for the 1995 SDR sample cases was to develop a basic weight that reflected the selection probabilities for each case. This basic weight was calculated as the inverse of the sampling rate for each case. The next step was to adjust the basic weight for nonresponse. Nonresponse adjustment cells were created using poststratification. Within each nonresponse adjustment cell, a weighted nonresponse rate was calculated. The nonresponse adjustment factor was the inverse of this weighted response rate.1
Let f be the final adjustment factor for a given cell and BSCWGT denote the basic weight for the respondents. The final weight (FINWGT) for the respondents is given by
1
The initial set of nonresponse adjustment factors was examined, and under certain conditions some of the cells were collapsed.
OCR for page 61
Estimates in this report were developed by summing the final weights of the respondents selected for each analysis.
Reliability of the 1995 Survey Estimates
Because the estimates shown in this report are based on a sample, they may vary from those that would have been obtained if all members of the target population had been surveyed (using the same questionnaire and data collection methods). Two types of error are possible when population estimates are derived from measures of a sample: nonsampling error and sampling error. By looking at these errors, it is possible to estimate the accuracy and precision of the survey results. Potential sources of nonsampling error in the 1995 SDR are discussed below, followed by a discussion of sampling error—how it is estimated and how it can be used in interpreting the survey results.
Nonsampling Error
Nonsampling errors in surveys can arise at many points in the survey process, and they take different forms:
Coverage errors can occur when some members of the target population are not identified and therefore do not have a chance to be selected for the sample.
Response errors can occur either when the wrong individual completes the survey or when the correct individual cannot accurately recall the events being questioned. Response errors can also arise from deliberate misreporting or poor question wording that leaves room for inconsistent interpretation by respondents.
Processing errors can occur at the point of data editing, coding, or key entry.
Nonresponse errors can occur when some or all of the survey data are not collected in a survey year.
In the 1995 survey, coverage errors are likely to be minimal because the DRF (the sampling frame for the SDR) is considered a complete census.2 Every effort was made to assure that the wrong person did not complete the form and that questions were clear and unambiguous, which keeps response errors to a minimum. Furthermore, careful cross checking and editing reduced processing errors.
However, this leaves the largest potential source of nonsampling error—nonresponse. Nonresponse bias is defined as "the bias or systematic distortion in survey estimates occurring because of the inability to obtain a usable response from some members of the sample."3
2
Henderson, P. H., J. E. Clarke, and M. A. Reynolds, 1996, Summary Report 1995: Doctorate Recipients from United States Universities , National Academy Press, Washington, D.C.
3
Lessler, Judith T. and William D. Kalsbeek, 1992, Nonsampling Error in Surveys, Wiley, New York, p. 118.
OCR for page 61
Nonresponse bias is concerned with the "representativeness" of the respondents, that is, with how respondents' characteristics compare with those of the population from which they were chosen. If the respondents do not accurately represent the population, this would result in inaccurate population estimates.
Table A1 shows the overall weighted response rate and weighted response rates by subgroups. The overall weighted response rate4 was 65.1 percent. By field of degree, weighted response rates ranged from 60.8 percent (doctorates in philosophy) to 69.8 percent (doctorates in American history). Subgroups defined by cohort and sex had response rates ranging from 64.0 to 67.1 percent. While the direction and magnitude of bias in the estimates derived from the survey are not known, the response rates obtained suggest that nonresponse bias may exist.
Sampling Error
Sampling error is the variation that occurs by chance because a sample, rather than the entire population, is surveyed. The particular sample that was used to estimate the 1993 population of humanities doctorates in the United States was one of a large number of samples that could have been selected using the same sample design and size. Estimates based on each of these samples would have differed.
Standard errors indicate the magnitude of the sampling error that occurs by chance because a sample rather than the entire population was surveyed. Standard errors are used in conjunction with a survey estimate to construct confidence intervals—bounds set around the survey estimate in which, with some prescribed probability, the average estimate from all possible samples would lie. For example, approximately 95 percent of the intervals from 1.96 standard errors below the estimate to 1.96 standard errors above the estimate would include the average result of all possible samples.5 With a single survey estimate, the 95 percent confidence limit implies that if the same sample design were used over and over again, with confidence intervals determined each time from each sample, 95 percent of the time the confidence interval would enclose the true population value.
The number of survey estimates in the SDR for which standard errors might have been estimated was extremely large because of the number of variables measured, the number of subpopulations, and the values—totals, percentages, and medians—that were estimated. Direct calculation of standard error estimates from the raw data for each estimate was not possible because of time and cost limitations. Instead, a method was used for generalizing standard error values from a subset of survey estimates that characterize the population, allowing application to a wide variety of survey estimates.
4
The weighted response rate is defined as the total returns (inscope and outofscope) multiplied by their basic weights divided by those in the survey sample multiplied by their basic weights. Weighted response rates take into account the unequal probabilities of selection to the sample and indicate the potential for nonresponse bias in the survey estimates.
5
Approximately 90 percent of the intervals from 1.64 standard errors above and below the estimate would include the average result of all possible samples; or, if more precision is required, approximately 99 percent of the intervals from 2.58 standard errors above and below the estimate would include the average result of all possible samples.
OCR for page 61
This method computes the variances associated with selected variables and uses these estimates to develop values of a and b parameters (regression coefficients) for use in generalized variance functions that estimate the standard errors associated with a broader range of totals and percentages. Base a and b parameters are shown in Table A2. These parameters were used to generate tables of approximate standard errors shown as Tables A3 through A6. The use of these tables is described below, together with an alternative method for approximating the standard errors more directly.
Standard Errors of Estimated Totals
Tables A3 and A4 show approximate standard errors for the humanities doctoral population overall, for field groupings used in the report (e.g., history and philosophy), and for females by field. The standard errors shown in the tables were calculated using the appropriate values of a and b, along with the following formula for standard errors of totals:
where x is the total. Resulting values were rounded to the nearest multiple of 10. The illustration below shows how to use the tables to determine the standard errors of estimates shown in the report.
Illustration. The number of humanities Ph.D.s employed in the private forprofit sector is reported at 5,800. To determine the approximate standard error, one can use the values shown in Table A3 for the estimated numbers of 5,000 and 10,000 in the "All Fields" column, or 320 and 450, respectively. Then, through linear interpolation, one can calculate 341 as the approximate standard error of the estimate of 5,800 as follows:
On the other hand, using the values of a and b for all humanities Ph.D.s from Table A2 and Formula 1, one can also calculate the approximate standard error more directly:
To develop a 95 percent confidence interval around this estimate of 5,800, one would add and subtract from the estimate the standard error multiplied by 1.96. This means that the average estimate from all possible samples would be expected 95 times out of 100 to fall within the range of
OCR for page 61
This range of 5,118 to 6,482 represents the 95 percent confidence interval for the estimated number of 5,800.
Standard Errors of Estimated Percentages
Percentages are another type of estimate given throughout the report. The standard error of a percentage may be approximated using the following formula:
where x is the numerator of the percentage, y is the denominator of the percentage, p is the percentage (0 << p << 100), and b is from Table A2. Tables of standard errors of estimated percentages were derived using this formula and are shown in Tables A5 and A6. Formula 2 may be used to calculate the standard errors of percentages not shown in the tables.
Illustration. Using the same example mentioned earlier but stated as a proportion, approximately 5.8 percent of all humanities doctorates were employed in the private forprofit sector. That is, of the 99,100 individuals who are employed, 5,800 were working in the private forprofit sector, or about 5.8 percent. Table A5 shows the approximate standard error of a 5 percent characteristic on a base of 100,000 (the closest values) to be 0.3.
Alternatively, using the appropriate value of b from Table A2 and Formula 2, the standard error of p may be determined as follows:
To develop a 95 percent confidence interval around this estimate of 5.8 percent, one would add and subtract from the estimate the standard error multiplied by 1.96. That is, the average estimate from all possible samples would be expected 95 times out of 100 to fall within the range of
The range of 5.11 to 6.49 represents the 95 percent confidence interval for the estimated percent of 5.8.
OCR for page 61
Limitations of the Standard Error Estimates
As mentioned, the standard error estimates provided in this report were derived from generalized functions on the basis of a limited set of characteristics (or survey estimates). Although this method provides good approximation of standard errors associated with most survey results, it may overstate the error associated with estimates drawn from strata with high sampling fractions. However, the only way to avoid this overstatement is to calculate the standard errors directly from the raw data, forgoing the practical, and more widely applicable, generalized method.
OCR for page 61
TABLE A1 Response Rates by Summary Strata (Field, Cohort, and Gender), 1995
Sampling Frame
Survey Sample
In Scope Returns
Outof Scope Returns
Total Returns
Weighted Response Rate (%)
Field of Doctorate
Art History
3,826
397
246
19
265
67.1
American History
7,536
569
391
6
397
69.8
Other History
17,043
1,070
639
55
694
65.2
Music
11,234
818
533
24
557
68.6
Speech/Theater
6,070
581
373
16
389
65.6
Philosophy
8,979
763
429
35
464
60.5
English/American Lang/Lit
29,624
2,042
1,263
61
1,324
65.6
French/Spanish Lang/Lit
9,295
773
443
24
467
60.9
Other Modern Lang/Lit
8,492
748
435
36
471
63.3
Classics
2,371
315
191
15
206
65.3
Other Humanities
10,573
753
450
30
480
64.2
Cohort
19851994 Doctorates
32,804
2,698
1,715
73
1,788
66.1
Pre1985 Doctorates
82,239
6,131
3,678
248
3,926
64.7
Gender
Male/Unknown
73,364
5,521
3,292
203
3,495
64.0
Female
39,679
3,308
2,101
118
2,219
67.1
Total
115,043
8,829
5,393
321
5,714
65.1
NOTE: Outofscope sample cases are those learned to be deceased, living outside the United States, or over the age of 75. The weighted response rate is the total returns (inscope and outofscope) multiplied by their basic weights divided by the survey sample multiplied by their basic weights.
SOURCE: National Research Council, Survey of Humanities Doctorates.
OCR for page 61
TABLE A2 Listing of a and b Parameters (Select Groups in Humanities Fields), 1995
Gender
Years Since Doctorate
Field of Doctorate
Parameters
Total
Male
Female
5 or Less
615
1625
Over 25
Total, Humanities
a
0.0002
0.0003
0.0005
0.001
0.0007
0.0005
0.0016
b
22.0334
24.542
18.7954
19.5561
22.6625
20.9583
38.9508
History
a
0.0011
0.0016
0.0024
0.0031
0.0035
0.0017
0.0137
b
27.5428
31.4577
12.5688
11.4072
20.1781
23.3711
86.5962
Art History
a
0.0072
0.0206
0.0084
0.0112
0.0149
0.0066
0.0297
b
25.4867
30.2316
17.3578
7.792
19.7987
6.4701
18.3836
Music
a
0.0013
0.0011
0.0092
0.0041
0.0025
0.0018
0.002
b
14.5382
8.6691
21.4679
11.9481
9.7411
8.1653
2.5097
Philosophy
a
0.0016
0.0028
0.0067
0.0037
0.0062
0.0005
0.0014
b
14.448
19.0081
10.6307
4.0905
14.4724
1.5788
4.1062
Engl/Am Lang/Lit
a
0.0007
0.0011
0.0008
0.005
0.0037
0.0005
0.0004
b
18.9447
18.1487
10.1054
20.8339
25.2734
4.602
2.9169
Classics
a
0.0037
0.0069
0.0079
0.0064
0.021
0.0837
0.0065
b
8.3628
10.4816
5.4447
1.9178
12.3877
23.8383
3.6465
Modern Lang/Lit
a
0.0007
0.001
0.0026
0.0048
0.0046
0.0014
0.0029
b
14.7339
11.9273
17.7044
16.2059
25.0407
12.9656
13.7691
Other Humanities
a
0.0008
0.0014
0.0017
0.0042
0.0026
0.0026
0.0026
b
17.5967
19.0471
12.9988
18.4607
18.5882
17.3912
12.496
SOURCE: National Research Council, Survey of Humanities Doctorates.
OCR for page 61
TABLE A3 Approximate Standard Error of Estimated Number of Humanities Doctorates, by Field, 1995
Estimated
All
Art
Engl/Am
Modern
Other
Number
Fields
History
History
Music
Philosophy
Lang/Lit
Classics
Lang/Lit
Humanities
50
30
40
40
30
30
30
20
30
30
100
50
50
50
40
40
40
30
40
40
200
70
70
70
50
50
60
40
50
60
500
100
120
100
80
80
100
60
80
90
700
120
140
120
100
100
110
60
100
110
1,000
150
160
140
120
110
140
70
120
130
2,500
230
250
140
170
160
210

180
200
5,000
320
330
200
180
280

240
260
10,000
450
410



350

280
310
25,000
650




190



50,000
780








75,000
730








100,000
450








TABLE A4 Approximate Standard Error of Estimated Number of Female Humanities Doctorates, by Field, 1995
Estimated
All
Art
Engl/Am
Modern
Other
Number
Fields
History
History
Music
Philosophy
Lang/Lit
Classics
Lang/Lit
Humanities
50
30
20
30
30
20
20
20
30
30
100
40
40
40
50
30
30
20
40
40
200
60
50
60
60
40
40
30
60
50
500
100
80
80
90
60
70
30
90
80
700
110
90
90
100
60
80

110
90
1,000
140
100
90
110
60
100

120
110
2,500
210
130



140

170
150
5,000
290




170

150
150
10,000
370




150



25,000
400








SOURCE: National Research Council, Survey of Humanities Doctorates.
OCR for page 61
TABLE A5 Approximate Standard Errors of Estimated Percentages of Humanities Doctorates, 1995
Base Number
Estimated Percentages
of Percent
1 or 99
2 or 98
5 or 95
10 or 90
15 or 85
25 or 75
50
50
6.6
9.3
14.5
19.9
23.7
28.7
33.2
100
4.7
6.6
10.2
14.1
16.8
20.3
23.5
200
3.3
4.6
7.2
10.0
11.9
14.4
16.6
500
2.1
2.9
4.6
6.3
7.5
9.1
10.5
700
1.8
2.5
3.9
5.3
6.3
7.7
8.9
1,000
1.5
2.1
3.2
4.5
5.3
6.4
7.4
2,500
0.9
1.3
2.0
2.8
3.4
4.1
4.7
5,000
0.7
0.9
1.4
2.0
2.4
2.9
3.3
10,000
0.5
0.7
1.0
1.4
1.7
2.0
2.3
25,000
0.3
0.4
0.6
0.9
1.1
1.3
1.5
50,000
0.2
0.3
0.5
0.6
0.7
0.9
1.0
75,000
0.2
0.2
0.4
0.5
0.6
0.7
0.9
100,000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
SOURCE: National Research Council, Survey of Humanities Doctorates.
TABLE A6 Approximate Standard Errors of Estimated Percentages of Female Humanities Doctorates, 1995
Base Number
Estimated Percentages
of Percent
1 or 99
2 or 98
5 or 95
10 or 90
15 or 85
25 or 75
50
50
6.1
8.6
13.4
18.4
21.9
26.5
30.7
100
4.3
6.1
9.4
13.0
15.5
18.8
21.7
200
3.1
4.3
6.7
9.2
10.9
13.3
15.3
500
1.9
2.7
4.2
5.8
6.9
8.4
9.7
700
1.6
2.3
3.6
4.9
5.9
7.1
8.2
1,000
1.4
1.9
3.0
4.1
4.9
5.9
6.9
2,500
0.9
1.2
1.9
2.6
3.1
3.8
4.3
5,000
0.6
0.9
1.3
1.8
2.2
2.7
3.1
10,000
0.4
0.6
0.9
1.3
1.5
1.9
2.2
25,000
0.3
0.4
0.6
0.8
1.0
1.2
1.4
SOURCE: National Research Council, Survey of Humanities Doctorates.
OCR for page 61
This page in the original is blank.