Appendix A
Joint Distribution of Topic Flags
The first step in the multiple imputation process is to estimate the joint probability distribution of the full set of topic flags, conditioned on available survey and administrative data. Accounting for the uncertainty in the estimation of model parameters for the distribution is typically accomplished by alternating draws from a posterior distribution of the parameters θ, given the fully observed data X, observed data Yobs, and the most recent draw of the missing data Yimp(t):
θ(t) ~ p(θ∣Yobs,Yimp(t),X)
where a draw from the missing data, given the most recent draw of θ(t), is
Yimp(t+1) ~ p(Ymis∣Yobs,X,θ(t))
(see Rubin, 1987).
In practice, difficulties can arise when Ymis is multidimensional and
p(Ymis∣Yobs,X,θ) = p(Y1mis,…,Ykmis ∣Yobs,X,θ)
is high-dimensional (large k) and/or not in a known form (e.g., if Y1mis,…,Ykmis are a mix of different distributions without known joint form). If missingness is monotonic (i.e., if Y2 is missing only when Y1 is missing, Y3 is miss-
ing only when Y1 and Y2 are missing, and so forth, through Yk), the joint distribution can be decomposed as:
p(Y1mis,…,Ykmis ∣Yobs,X,θ) =
p(Y1mis∣Yobs,X,θ)p(Y2mis,…,Ykmis ∣Yobs,Y1mis,X,θ) …
p(Ykmis∣Yobs,Y1mis,…,Yk–1misX,θ)
and draws are obtained such as:
Y1imp(t+1) ~ p(Y1mis∣Yobs,X,θ(t)), Y2imp(t+1) ~ p(Y2mis∣Yobs,Y1imp(t+1)X,θ(t)), etc.
However, the missingness in SIPP is not monotonic because the 25 topic flags display many different patterns of missing data. Hence sequential regression multiple imputation (SRMI; see Raghunathan et al., 2001) provides an alternative imputation by replacing the direct draw of Yimp(t+1) ~ p(Ymis∣Yobs,X,θ(t)) with a series of conditional imputations:
Y1imp(t+1) ~ p(Y1mis∣Yobs,Y2imp(t),…,Ykimp(t),X,θ(t)) through
Ykimp(t+1) ~ p(Ykmis∣Yobs,Y1imp(t+1),…,Yk–1imp(t+1),X,θ(t)).
The SIPP processing system implements SRMI using T = 5 iterations. Imputation for the topic flags is conducted using logistic regression models, stratified on demographic factors, where subject matter experts designed the details of the model for each content flag. An important point, discussed in Chapter 5, is that although SIPP documentation refers to SRMI (sequential regression multiple imputation), only a single imputation is provided.