eral principles of design-based and model-based inference, explain the main advantages and disadvantages of both, and provide some guidance on how such a guide could be written. The panel also did not address the software issue in detail, because this depends critically on the specific analyses being undertaken.
We begin with a short description of data analysis issues. We then discuss the conceptual and theoretical underpinnings for the different frameworks under which statistical modeling and analysis can be performed, as well as their implications for analytical inference. In the final section, we offer some guidance on how a Guide for Researchers for ARMS could be written.
The complex design of ARMS includes stratification, clustering, dual frames, and unequal probability sampling. Each year, NASS provides survey weights that account for these design features as well as for additional information available at the population level and various nonresponse adjustments (see Chapter 6). NASS has also developed and makes available sets of replication weights to facilitate computation of variance estimates, with the current method based on delete-a-group jackknife replication. The survey weights and the replication weights are provided with the ARMS datasets.
Recommendation 7.1: NASS should continue to provide survey weights with the ARMS data set, combined with replication weights for variance estimation.
An important use of data from surveys such as ARMS is for descriptive inference, in which population-level and domain-level quantities of interest are estimated from the survey data. An example of a population quantity of interest for ARMS is the average amount spent on fertilizer by all farms, while a domain quantity of interest is the average amount spent on fertilizer by all farms with annual sales over $50,000. Estimates for population and domain (i.e., a subset of the population) quantities are computed as weighted sums over the sample using the survey weights. The variance is estimated by computing jackknife replicates as the weighted sums for each set of replication weights and averaging the sum of the squared deviations from the mean over the full sample estimates (see Box 7-1). When targeting unknown simple or narrowly conditioned quantities of interest in a finite population and in medium-to-large domains within the populations, this randomization-based type of estimation, and the associated inference in terms of standard deviation and confidence intervals,