Data collected in the departmental and faculty surveys were used to answer various research questions in this report. Statistical analyses consisted essentially of fitting various types of regression models, including multiple linear regression, logistic regression, and Poisson regression models depending on the distributional assumptions that were appropriate for each response variable of interest. In some cases, the response variable was transformed so that the assumption of normality for the response in the transformed scale was plausible. Marginal or least-squares means were calculated (sometimes in the transformed scale) for effects of interest in the models.

We let *y* denote a response variable such as the proportion of women in the applicant pool or annual salary or number of manuscripts published in a year, and use *x* to denote a vector of covariates that might include type of institution, discipline, proportion of women on the search committee, etc. If *y* can be assumed to be normally distributed with some mean *μ* and some variance *σ*^{2} then we typically fit a linear regression model to *y* that establishes that μ = *xβ*, where *β* is a vector of unknown regression coefficients.

When the response *y* is not normally distributed (for example, because *y* can only take on values 0 and 1) then we can define *η = xβ* and then choose a transformation *g* of *μ* such that

For example, if the response variable is a proportion, the logit transformation

is appropriate. When *y* is a count variable (as in the number of manuscripts published in a year) the usual transformation is the log transformation.

One approach to obtaining estimates of *β* is the method of maximum likelihood. Let denote the maximum likelihood estimate (MLE) of *β*. A nice property of MLEs is invariance; in general, the MLE of a function *h*(*β*) is equal to the function of the MLE of *β*, thus

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 291

Appendix 3-7
Marginal Mean and Variance of
Transformed Response Variables
Data collected in the departmental and faculty surveys were used to answer
various research questions in this report. Statistical analyses consisted essentially
of fitting various types of regression models, including multiple linear regression,
logistic regression, and Poisson regression models depending on the distributional
assumptions that were appropriate for each response variable of interest. In some
cases, the response variable was transformed so that the assumption of normality
for the response in the transformed scale was plausible. Marginal or least-squares
means were calculated (sometimes in the transformed scale) for effects of interest
in the models.
TRANSFORMATIONS
We let y denote a response variable such as the proportion of women in the
applicant pool or annual salary or number of manuscripts published in a year, and
use x to denote a vector of covariates that might include type of institution, disci-
pline, proportion of women on the search committee, etc. If y can be assumed to
be normally distributed with some mean m and some variance s2 then we typically
fit a linear regression model to y that establishes that m = xb, where b is a vector
of unknown regression coefficients.
When the response y is not normally distributed (for example, because y can
only take on values 0 and 1) then we can define h = xb and then choose a trans-
formation g of m such that
g(m) = h = xb.
For example, if the response variable is a proportion, the logit transformation
µ
g( µ ) = log
1– µ
is appropriate. When y is a count variable (as in the number of manuscripts pub-
lished in a year) the usual transformation is the log transformation.
One approach to obtaining estimates of b is the method of maximum likeli-
hood. Let β denote the maximum likelihood estimate (MLE) of b. A nice prop-
ˆ
erty of MLEs is invariance; in general, the MLE of a function h(b) is equal to the
function of the MLE of b, thus
ˆ
h(β ) = h(β ).
ˆ

OCR for page 291

GENDER DIFFERENCES IN FACULTy CAREERS
ˆ
In particular, if η = x β , then
ˆ
µ = g –1 (η) .
ˆ
ˆ
The difficulty arises when we wish to also estimate the variance of µ for ˆ
example to then obtain a confidence interval around the point estimate µ . To do
ˆ
so, we typically need to resort to linearization techniques that allow us to com-
pute an approximation to the variance of a non-linear function of the parameters.
A method that can be used for this purpose is called the Delta method and is
described below.
LEAST-SQuARES MEANS
Least-squares means of the response, also known as adjusted means or mar-
ginal means can be computed for each classification or qualitative effect in the
model. Examples of qualitative effects in our models include type of institution
(two levels: public or private) discipline (with six categories in our study), gen-
der of chair of search committee, and others. Least-squares means are predicted
population margins or within-effect level means adjusted for the other effects in
the model. If the design is balanced, the least-squares means (LSM) equal the
observed marginal means. Our study design is highly unbalanced and thus the
LSM of the response variable for any effect level will not coincide with the simple
within-effect level mean response.
ˆ
Each least-squares mean is computed as L ′β for a given vector L. For exam-
ple, in a model with two factors A and B, where A has three levels and B has two
levels, the least squares mean response for the first level of factor A is given by:
LSM ( Ai ) = L ′β = 1100
1 1 ˆ
ˆ β,
2 2
where the first coefficient 1 in L corresponds to the intercept, the next three coef-
ficients correspond to the three levels of factor A and the last two coefficients
correspond to the two levels of factor B. If the model also includes an interaction
ˆ
between A and B, then L and β has an additional 3 × 2 elements. The correspond-
ing values of the additional six elements in L would be ½ for the two interaction
levels involving the first level of factor A (AB, AB) and 0 for the four interaction
levels that do not involve the first level of factor A (AB, AsB, AB, AB). The
coefficient vector L is constructed in a similar way to compute the LSM of y (or a
transformation of y) for the remaining two levels of A, two levels of B, and even
for the six levels of the interaction between A and B if it is present in the model.
When the response variable has been transformed prior to fitting the model,
the LSM is computed in the transformed scale and must be then transformed
back into the original scale. If we have MLEs of the regression coefficients, we
can easily compute the LSMs in the original scale simply by applying the inverse

OCR for page 291

APPENDIXES
ˆ ˆ
transformation to L ′β . For example, if g(m) = log(m) = xb and L ′β is the least
squares mean in the transformed scale, we can compute the LSM in the original
scale as
)
( () ()
LSM original = g –1 LSM transformed = g –1 L ′B = exp L ′B
ˆ ˆ
If the transformation was the logit transformation, the LSM in the original scale
is computed as
()
exp L ′B
ˆ
)
( ()
LSM original = g –1 LSM transformed = g –1 L ′B =
ˆ
()
.
1 + exp L ′B
ˆ
VARIANCE OF A NONLINEAR FuNCTION OF PARAMETERS
Suppose that we fit a model to a response variable that has been transformed
ˆ
using some function g as above, and obtain an estimate of a mean L ′β . Pro-
ˆ . We can
grams including SAS will also output an estimate of the variance of L ′β
compute the estimate of the mean in the original scale by applying the inverse
ˆ
transformation g–1 to L ′β as described above. In order to obtain an estimate of the
() ˆ , however, we need to make use of, for example, the Delta
variance of g L ′β
–1
method, which we now explain.
Given any non-linear function H of some scalar-valued random variable θ,
H(θ) and given s2, the variance of θ, we can obtain an expression for the variance
of H(θ) as follows:
2
∂H (θ ) 2
Var ( H (θ )) = σ.
∂θ
For example, suppose that we used a log transformation on a response variable
ˆ
and obtained an LSM in the transformed scale that we denote L ′β , with estimated
variance σ L ′β . The estimate of the mean in the original scale is obtained by apply -
ˆ
ing the inverse transformation to the LSM:
()
ˆ
= exp L ′β
m = LSM
ˆ original
ˆ
The variance of m is given by:
( ) σˆ
2
∂ exp L ′β
ˆ
()
2
ˆ = = exp L ′β σ L ′βˆ .
ˆ
σ 2
ˆ
∂L ′β L ′β
ˆ
ˆ
ˆ
m
Suppose now that the response variable was binary and that we used a logit
transformation so that

OCR for page 291

GENDER DIFFERENCES IN FACULTy CAREERS
µ
g( µ ) = log .
1– µ
ˆ
Given an MLE β and an estimate of L ′β the least squares mean in the trans-
ˆ
formed scale, we compute m and σ m as follows:
2
ˆˆ
ˆ
() ˆ
exp L ′β
m=
()
ˆ ,
ˆ
1 + exp L ′β
( ˆ ) σˆ
2
exp L ′β
σ =
2
ˆ
( ˆ )
.
m 1 + exp L ′β L ′β
ˆ
ˆ
Given a point estimate of the least squares mean in the original scale and an
approximation to its variance, we can compute an approximate 100(1– a)% con-
fidence interval for the true mean in the original scale in the usual manner:
100(1 – α )% for m = m ± tdf ;α / 2 σ m ,
ˆ 2ˆ
ˆ
where df is the appropriate degrees of freedom. In our case, and due to relatively
large sample sizes everywhere, the t critical value can be replaced by the corre-
sponding upper a/2 tail of the standard normal distribution.