If the probability in equation (13) or (14) exceeds 50 percent then it is more likely than not that the household does fall above f* on the f-scale, and it often makes sense to so classify the household. More complicated rules can be devised that take account of possibly different costs for errors of misclassification.
The approach outlined above is a basic way to form a classification system using an IRT model. If the Rasch IRT model is assumed, then it may be shown that the latent posterior distribution in equation (11) depends only on the number of affirming responses of the household rather than on which questions are affirmed. This simplifies the classification rule to make it more like the one used by USDA, but it requires that the Rasch model accurately represents the distribution of the observed responses to the dichotomized HFSSM questions. Johnson (2004) indicates that the 2PL model provides a better fit to the HFSSM data that he examined.
When a cut point, *, has been established along the -scale, the prevalence rate in the population described by the latent distribution, f(), for the condition that exceeds * is naturally defined as
In addition to this overall prevalence rate, the prevalence in a subgroup of household indicated by G = g is given by
However, the practice of USDA is to set the cut points on the scale of the manifest variables, that is, on the number of affirmations of the HFSSM questions. What is the connection between the proportions of households that affirm some number of the HFSSM questions and the prevalence rate defined in equation (15) or (16)? To answer this, let A denote the number of the dichotomized HFSSM questions affirmed by a household, then the percentage of households that affirm x or more of the questions is
Examples of subgroup prevalence rates are found in Table 2 of Nord et al. (2004). If the interest is on the percentage of households in a subgroup