The Agricultural Resource Management Survey (ARMS) program of the U.S. Department of Agriculture (USDA) is a study in complexity. Data are collected on three different occasions and from three different levels: the field, the farm, and the household associated with the farm. Other aspects, including the technical design (a multiphase, multiframe, stratified, probability-weighted sample), an ever-changing content, and the multiple modes of data collection increase the complexity.
Complexity is both a source of strength in the ARMS program and a challenge to those responsible for conducting the survey and analyzing the results. Today, the survey is more than an assemblage of several data collections within the National Agricultural Statistics Service. In the mid-1990s, ARMS was created by merging the objectives of two USDA surveys: the Farm Costs and Returns Survey (FCRS) and the Cropping Practices Survey (CPS). The FCRS objectives were to collect whole-farm production, organization and financial information, household demographic and financial information, and enterprise-level costs and returns information for selected commodities. The CPS objectives were to collect field-level chemical use, tillage practices, and other field practices for selected commodities. The ARMS program provides indispensable linkages of data for fields, farms, and households in a manner that permits analysis of management practices, profitability, and farm family composition and well-being, among other topics. No other source affords such a comprehensive and complete view of the American farm. Few other sources pose such complicated challenges to methodologists, data collectors, data processors, and analysts.
ARMS is USDA’s primary source of information on the financial condition, production practices, and resource use of farms, as well as the economic well-being of America’s farm households. Its data are essential to USDA and other federal administrative, congressional, and private-sector decision makers when they must weigh alternative policies and programs or business strategies that touch the farm sector or affect farm families (Box 1-1).
The basic USDA definition of a farm is any place from which $1,000 or more in nominal terms of agricultural products were produced and sold, or normally would or could have been sold, during the census year. This definition is common to both the Census of Agriculture and ARMS, and is reflected in the terms “census farms” and “census-defined farms.” The definition has been steady for many years and encompasses many small, hard-to-measure businesses, which are difficult to identify and survey.
ARMS comprehensively provides observations of field-level farm practices, the economics of farm businesses operating the field (or dairy herd, greenhouse, nursery, poultry house, etc.), and the characteristics of the
A Thumbnail Sketch of the ARMS Program
The Agricultural Resource Management Survey is the primary source of information for the U.S. Department of Agriculture and the public on a broad range of issues about U.S. agricultural resource use, costs, and farm-sector financial conditions. It is the only source of information available for objective evaluation of many critical issues related to agriculture and the rural economy. The survey is conducted by the National Agricultural Statistics Service (NASS) in collaboration with the Economic Research Service (ERS).
The survey sample is designed to provide coverage of all farms in the 48 contiguous states plus state-level data for 15 major farm states. The population of farms, as defined by the Census of Agriculture, includes all establishments that produced and sold (or would normally or could have sold) at least $1,000 of agricultural products during the previous year. A sample from the NASS list frame is supplemented by a geographic sample of area tracts to ensure complete coverage.
ARMS collects data for whole farms and commodity-specific production practice and cost data, on a rotating basis, for selected commodities in Phase II of the survey and in commodity-specific versions of Phase III. The survey collects data in three phases:
American farm household (age, education, occupation, farm and off-farm work, types of employment, family living expenses, etc.) collected through interrelated, representative samples.
The survey has increased in complexity as it has matured over the years. The current pattern of rotating commodities between survey cycles is one example of a survey design decision that, although made for practical reasons, has tended to increase the complexity of the survey operation. Today, the pieces and parts of the survey can be described as
a cooperative management and financing venture between two agencies that have independent objectives;
a multiphase operation with three distinct survey operations, which, though integrated and building on each other, have different purposes and different constituencies;
Farm operators are selected to ensure adequate coverage by state and region and to minimize reporting burden. Strata are based on state, the value of agricultural sales, and type of farm. Phase I screening is performed by mail and phone. Operators who are in business or have the commodity of interest (which varies by year) are eligible to be selected for Phase II or Phase III. Phase II data are collected by means of personal interviews, while Phase III surveys are conducted using several modes of data collection (face-to-face, telephone, mailout-mailback with face-to-face follow-up for the mail respondents, and, on an experimental basis, the Internet).
The commodities surveyed are rotated every 5-6 years to focus on resource use and production costs for specific commodities.
SOURCE: National Agricultural Statistics Service.
an elaborate sampling frame consisting of both a traditionally constructed list of farms (the list frame) and a special list obtained by an intensive geographic area screening (the area frame);
a complex collection scheme implemented under a cooperative agreement by a nonfederal organization;
an ambitious, cognitively challenging survey questionnaire that, despite several efforts at simplification, is perceived by USDA to be so burden some to respondents that pains are currently taken to minimize revisits to them, which limits the ability to longitudinally follow these reporting units; and
a complex estimation and variance computation procedure, which, although appropriate for its purpose, can place limitations on the ability of data analysts to perform multivariate analysis using standard statistical packages and to determine if the analytical result is statistically valid and reliable.
CHARGE TO THE PANEL
The responsible agencies, the National Agricultural Statistics Service (NASS) and the Economic Research Service (ERS) of the U.S. Department of Agriculture, are well aware of the challenges posed by the ARMS program and have sought to improve many aspects of survey operations and analysis as time and resources have permitted. As part of a program of continuous improvement, the two agencies joined in requesting this review of ARMS. To conduct the review, National Research Council, through the Committee on National Statistics, appointed the Panel to Review USDA’s Agricultural Resource Management Survey, whose members have expertise in household and business survey methods, the economics of farming and farm households, and complex sample designs.
The charge to the panel was to address two related tasks: (1) review the characteristics of the survey itself, including concepts, sample design, questionnaire design, data collection, and data processing and estimation, considering for each whether USDA is using the best practices in each of these aspects of the survey and how its practices might be improved, and (2) study the uses of the data for econometric policy-relevant analyses. Of particular concern is whether ARMS uses state-of-the-art methods to fit statistical models to ARMS data—that is, for estimating the variance of estimates and the implications for univariate and multivariate estimation of the complex sample design. Drawing on its experience as major users of ARMS data, the panel also reviewed the processes of the survey and various means of expanding access to the microdata for econometric and other analysis.
ISSUES IN SURVEY OPERATIONS
The complex nature of ARMS raises issues in nearly every aspect of survey operations, from conceptualization, organization, sampling, and questionnaire design, to data collection, data processing, analysis, and dissemination.
NASS and ERS have raised a number of questions about:
the adequacy and utility of the area frame sample, which supplements a list frame sample from the Census of Agriculture and other sources;
the appropriateness of the process used to determine questionnaire content, which includes operating characteristics and business financial information for each farm, the farm operator’s household off-farm income and other characteristics, and operating characteristics and farming practices for specific field crops, surveyed in various years;
whether best practices are being used to elicit high-quality responses to the economic and demographic questions, whether the ARMS questions pose major challenges to high-quality responses in that they are sensitive from a privacy perspective and very detailed in their inquiries about resource allocations and economic outcomes for the farm business and the farm household, and the effects of the fact that questions require extended memory recall and family records;
whether best practices are being used to elicit economic measures of farm and household performance for the prior year or, in some cases, the previous year;
how trade-offs between respondent burden and the need for imputation should be evaluated, and whether the best methods for imputing data due to nonresponse or due to unasked questions for particular subsamples are being employed;
other possible approaches for further reducing individual item nonresponse for both the mail and the in-person versions of the survey; and
whether respondents’ comprehension of, and responses to, ARMS questions are consistent with the concepts that USDA intended to measure and, if not, what additional information could be informative.
The panel’s assessment of the current methodology and practices has been conducted in light of an understanding of the state of survey methodology and best practices. These issues are addressed in Chapters 4, 5, and 6.
In addition to publication of summary data in cross-sectional and time series form, use of the data is expanding for hypothesis testing, econometric modeling, and other methods contributing to policy analysis. This growing
role is the result of an ongoing, successful program to promote awareness of, and access to, ARMS data by ERS. Today, a substantial and expanding number of government and academic researchers are using the survey data to conduct research on a wide range of topics, including analysis of farm business and household responses to government programs.
As the uses of ARMS data for increasingly sophisticated purposes have expanded, the experts in data analysis at ERS have increasingly been called on to provide advice and guidance to internal and external data users on appropriate techniques and methodologies for data analysis. To assist ERS, the panel has reviewed statistical hypothesis-testing procedures using ARMS data. We consider USDA’s choice of statistical procedures for estimating standard errors to test hypotheses with simple estimates and with complex econometric models and make recommendations on best practices for variance estimation and other statistical issues in the use of ARMS data by policy analysts. The types of specific questions that are considered include, for univariate statistics: the appropriate methods for calculating standard errors for use in hypothesis testing; the adequacy of the delete-a-group jackknife variance estimator for calculating standard errors, in general and in small samples; possible improvements to the delete-a-group jackknife estimator; and effects of ignoring the survey design in hypothesis testing. For advanced multivariate methods, the issues are more complex: the consequences of ignoring survey design when one is using the data in econometric analyses; the need to account for survey design in estimation of coefficients and standard errors and, if so, what approaches to take; the impact of the delete-a-group jackknife variance estimator on hypothesis testing for policy inferences and for professional presentation; any weaknesses of the jackknife estimator in this context; and whether alternative complex-sample estimators, particularly those used in standard analytic programs such as STATA or SAS, are acceptable. These issues are considered in Chapter 7.
THE PANEL’S APPROACH
Appointed in January 2006, the panel held its first meeting on February 2-3, 2006. Over the course of its inquiry, the panel has conducted five open meetings that involved interaction with ERS and NASS staff, as well as key data users, policy makers, and additional technical experts: a workshop on statistical methodology, June 8-9, 2006; a session for data users at the annual meeting of the American Agricultural Economics Association, July 24, 2006; a workshop on concepts and measurement, September 28-29, 2006; a workshop on inference, December 7-8, 2006; and an open discussion of cost-of-production issues, January 18, 2007. The agendas of these meetings appear in Appendix A.
The panel’s recommendations respond to USDA’s concerns about ARMS and its uses for policy analysis, identify specific needed improvements, and suggest a program of research, testing, and development to keep the ARMS current with data needs and state-of-the-art methods in the future. With the time and resources available, it was not possible to formulate a particular solution to each of the issues, some of which require considerable research, development, and testing. However, we did identify several issues for priority review in the research and development program we recommend. These research recommendations appear throughout the report.
The focus in this report is on quality, broadly defined as “fitness for use.” The definition of quality throughout the international statistical system consists of several constituent elements or dimensions. One commonly used set of six elements, to which this report adheres, includes relevance, accuracy, timeliness, accessibility, interpretability, and coherence (Organisation for Economic Co-operation and Development, 2003; Statistics Canada, 2003).
The relevance of statistical information reflects the degree to which it improves the decisions of clients. It is concerned with whether the available information improves the value of a decision and sheds light on the issues that are important to users. Assessing relevance is subjective and depends on the varying needs of users. The statistical agency’s challenge is to weigh and balance the conflicting needs of current and potential uses to produce a program that goes as far as possible in satisfying the most important needs within given resource constraints. Relevance is considered in Chapter 2 in the context of the need to make decisions on contemporary issues in American agriculture.
The accuracy of statistical information is the degree to which the information correctly describes the phenomena it was designed to measure. It is usually characterized in terms of error in statistical estimates and is traditionally decomposed into bias (systematic error) and variance (random error) components. It may also be described in terms of the major sources of error that potentially cause inaccuracy (e.g., coverage, sampling, nonresponse, response). The major potential sources of error are discussed in Chapters 4, 5, and 6.
The timeliness of statistical information refers to the delay between the reference point to which the information pertains and the date on which the information becomes available for use. There is typically a trade-off between timeliness and accuracy. Timeliness, in terms of its influence on relevance, is addressed in Chapter 2, while Chapter 8 discusses timeliness as a factor in overall quality.
The accessibility of statistical information refers to the ease with which it can be obtained from the statistical agency. This involves issues of dissemination, which are covered in Chapter 8, including the ease with which the
existence of information can be ascertained, the suitability of the form in which the information can be accessed, and the availability of user support services. The cost of using the information, including both direct costs for data products and the cost of travel to centralized repositories of research data, may also be an aspect of accessibility for some users.
The interpretability of statistical information reflects the availability of supplementary information about the data, often in the form of metadata (i.e., data about data items) and paradata (i.e., data about the data collection process), which are necessary to interpret and use it appropriately. This information usually includes the underlying concepts, variables and classifications, the methodology of data collection and processing, and indications or measures of the accuracy of the statistical information. Chapter 5 discusses these components of quality.
Finally, the coherence of statistical information reflects the degree to which it can be successfully brought together with other statistical information within a broad analytic framework and over time. The use of standard concepts, classifications, and target populations promotes coherence, as does the use of common methodology across surveys. ARMS presents special challenges for coherence, in that there is a need to be internally coherent among the three phases of the survey, as well with other USDA data. We examine one of the methods for achieving coherence—the calibration process used for setting production estimates and aligning the various sources of data regarding production in Chapter 6, as well as the methods for coherence in analysis of the data from ARMS itself, in Chapter 7.
At the operational level, these elements of quality are reflected in guidelines that have been promulgated by the U.S. Office of Management and Budget (OMB) (U.S. Office of Management and Budget, 2001). In addition to quality, the OMB guidelines address utility, objectivity, integrity, transparency, and reproducibility of information disseminated by federal agencies.
These guidelines have been appropriately embraced by the leadership of both NASS and ERS. This is important, since quality assurance is mainly a management function. Along with leaders of several of the largest federal statistical agencies, the administrators of NASS and ERS signed a statement in 2002 delineating federal statistical organizations’ guidelines for ensuring and maximizing the quality, utility, objectivity, and integrity of disseminated information. The role of the statistical agency in ensuring quality is summarized in this statement and bears repeating as the underlying theme of this report (U.S. Office of Management and Budget, 2002, p. 38468):
A statistical organization’s commitment to quality and professional standards of practice further includes: the use of modern statistical theory and practice in all technical work; the development of strong staff expertise
in the disciplines relevant to its mission; the implementation of ongoing quality assurance programs to improve data validity and reliability and to improve the processes of compiling, editing, and analyzing data; and the development of a strong and continuing relationship with appropriate professional organizations in the fields of statistics and relevant subject-matter areas.
To carry out its mission, a Federal statistical organization assumes responsibility for determining sources of data, measurement methods, methods of data collection and processing while minimizing respondent burden; employing appropriate methods of analysis; and ensuring the public availability of the data and documentation of the methods used to obtain the data. Within the constraints of resource availability, a statistical organization continually works to improve its data systems to provide information necessary for the formulation of public policy.
Beyond this, the OMB has directed each federal agency to issue its own information quality guidelines, and further guidelines have been issued by USDA, NASS, and ERS (U.S. Department of Agriculture, 2006; National Agricultural Statistics Service, 2007a; Economic Research Service, 2003).
More recently, OMB has issued detailed Standards and Guidelines for Statistical Surveys (U.S. Office of Management and Budget, 2006b), a comprehensive guide to developing and managing surveys in such a way as to obtain OMB approval for their conduct. In these guidelines, quality standards for the various stages of survey operations have been spelled out in some detail. The topics covered range from satisfactory survey response rates to the development of sampling frames to drawing of inferences from the data. The panel refers to these standards and guidelines in the report when discussing issues of compliance and noncompliance.
GUIDE TO THE REPORT
Following this introduction, we lay out the contemporary issues facing American agriculture and the relevant uses of ARMS data to address them in Chapter 2. These uses include those driven by congressional mandates and by agency and research community needs. Chapter 3 outlines the organizational structure behind ARMS and the collaborative management of NASS and ERS. Issues of sample and survey design, data collection, nonresponse, imputation, and estimation are addressed in the next three chapters. Chapter 7 provides a framework for analysis of complex surveys and issues related to inclusion of survey design in estimation. Data user concerns, including dissemination of data and opportunities for user feedback and training are addressed in Chapter 8. Chapter 9 summarizes the panel’s conclusions and recommendations.