National Academies Press: OpenBook

Improving Data Collection and Measurement of Complex Farms (2019)

Chapter: 6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources

« Previous: 5 The Growing Complexity of Farm Business Structure: Implications for Data Collection
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

6

A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources

In the previous chapter, a set of recommendations was forwarded to provide a data management framework that could improve the measurement of complex farm operations by addressing the challenges they create for reporting on the farm economy. Two themes run throughout these recommendations. First, the U.S. Department of Agriculture (USDA) must strive to consistently apply an organizational hierarchy through well-defined structures that allow for linkages both within farm businesses and from farm businesses to land and households. This capacity could be enabled through a Farm Register. Second, respondent burden must be minimized by surveying fewer individuals, asking them to answer fewer questions, and ensuring that questions are more carefully defined.

These themes reappear in this chapter, where we discuss how well-designed registers that generate reliable sample frames for survey products can be paired with other sources of data to improve data quality and utility while reducing respondent burden. There is significant untapped potential for the use of nonsurvey data—from administrative, commercial, and nonstructured data sources—in the production of agricultural statistics for the United States. This is an approach that statistical agencies across the globe are increasingly undertaking and one that is already being used for some purposes within USDA.

While this chapter emphasizes administrative data, it is important to acknowledge that the recommendations in Chapters 4 and 5 also increase the possibility of incorporating commercial and other types of data into USDA’s data infrastructure. This is a rapidly developing area, with new data products being introduced continually; indeed, the absence of a detailed

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

discussion reflects the panel’s concern that specific recommendations may be particularly vulnerable to obsolescence. Nonetheless, satellite data and commercial data, such as Monsanto’s Fieldscript, and Web-based platforms, such as Farmobile,1 have already become valuable sources of information on land, land use, yields, and production methods. Such data may also offer unique opportunities for collaboration between USDA, university researchers, and industry. While our discussion does not address commercial issues explicitly, we believe our discussion of the framework necessary to better incorporate existing administrative data can inform how USDA must develop a collection infrastructure to incorporate commercially sourced data as well.

While methods for exploring the use of alternative “big data”2 sources are being pursued by statistical agencies, the statistical validity of analyses based on them are not yet widely established. In contrast, survey methodology provides a known inferential framework for dealing with questions of data accuracy, representativeness, and confidentiality. For this reason, the primary method for compiling information about the nation’s farm businesses and households will, for the foreseeable future, continue to be survey-based, albeit with a growing need for linkages to administrative data. The emergence of information that can be captured from digital sources without intrusion on individuals’ time and resources does point to the likelihood that less obtrusive methods of collecting data will continue to grow.

According to a survey of 93 national statistical offices conducted by the United Nations Statistical Commission, respondents were most interested in using big data for “faster, more timely statistics” (88 percent), “reducing response burden” (75 percent), and creating “new products and services” (72 percent). Other reasons included “modernization of the statistical production process” (69 percent) and cost reduction (63 percent) (United Nations Economic and Social Council, 2016). A key finding was that, although many countries are pursuing options for exploiting large digital (public and commercial) data sets, “very few have yet been able to actually produce official statistics based on these sources” (National Academies of Sciences, Engineering, and Medicine, 2017a).

Nevertheless, there are examples of success in the use of nonsurvey, digital data in producing policy-relevant statistics both within and outside of government. Among these are Massachusetts Institute of Technology’s Billion Prices Project, which produces price indexes or measures of inflation

___________________

1 See https://www.farmobile.com.

2 The term “big data” is becoming outdated. As noted by Oremus (2017), we are now taking for granted that “data sets can contain billions or even trillions of observations and that sophisticated software can detect trends in them.” See http://www.slate.com/articles/technology/technology/2017/10/what_happened_to_big_data.html.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

using online posted prices for goods and services;3 a U.S. Bureau of Justice Statistics project to use Web-scraping to improve statistics on arrest-related deaths; and a Statistics Netherlands project that uses data from road sensors to generate transportation and traffic statistics (Puts et al., 2016). Future improvements to federal statistics will largely depend on the capacity of the agencies to leverage multiple data sources. Private-sector data that are continuously generated have emerged as an information source capable of improving the timeliness and detail of national statistics. In the agriculture sector, demands for small-area estimates of economic activities, for which sample sizes are often inadequate to provide precise estimates directly, will continue to increase—as will the need for model based estimates, and to incorporate massive digital datasets for the purpose.

Because modern farms maintain data about their businesses on their own computer systems, some farmers may prefer to share that data in digital form instead of filling out paper forms. In the years to come, agriculture will continue to transition to digital infrastructure, replacing paper invoices and accounts. The National Agricultural Statistics Service (NASS) and Economic Research Service (ERS), together with the land-grant universities, could take an active role in promoting such a digital infrastructure.

One option, as a small step in this direction, might be to work with agricultural accounting software companies and provide them with the algorithms to reformat their data (and add an extra screen for data entry for some nonaccounting data) and make it digitally available to NASS and ERS.4 Doing so, even if only on a small scale at first, could help the agencies learn about potential measurement errors that result from mismatches between software systems and survey questions. Farmers might also find that burden can be reduced with this kind of data collection approach. Even if farmers were unwilling to share their data electronically, the computerized forms could ease the completion of surveys such as the Agricultural Resource Management Survey (ARMS). Further, if these algorithms generate cost and price calculations for products, or average income and balance sheet estimates (as published by ERS), the approach could even be of value to farms not selected for ARMS as it could allow them to benchmark their own results with those against sector averages. Such a benchmarking ser-

___________________

3 It should be noted that the Billion Prices index relies on official Bureau of Labor Statistics (BLS) price indexes for benchmarking purposes.

4 The panel chose to present its ideas here somewhat informally since technology recommendations often become obsolete almost immediately after issue. The use of electronic records by farms is currently undergoing rapid change, so it would be difficult to pinpoint the future use of particular forms of electronic accounting data at this time. It is likely, however, that NASS would benefit from monitoring developments and experimenting with different, evolving forms of accounting and tax data that could potentially provide information that is similar to what is currently being collected using survey questionnaires.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

vice could be at the heart of a digital research infrastructure, in which the request or obligation to farmers to fill out surveys develops into a more collaborative relationship.

Recognizing that the farmer is the owner of all of his or her data, he or she could be asked to sign a consent authorization to transfer a copy of his or her digital data to ERS. This could also include contributing to data sets that are already available at universities or to commercial benchmark software, such as Farm Business Network or Farmobile. Cooperating farmers could be rewarded for such cooperation by receiving additional benchmark reports, if desired, or by being invited to engage in calculations of the effect of potential future policies (“citizen science”). This approach could be initiated with farms that are selected for ARMS; once the marginal costs of taking in the digital data of an extra farm declines considerably, more farms could be welcomed into the process, since that would improve the reliability of the estimates and support small-area estimates.

Such a digital research infrastructure could also advance standardization in data exchange. AgGateway5 is an American industry organization that promotes such standardization. Another initiative that could possibly share in such a collaboration might be the Sustainability Consortium,6 which runs a project on Data Landscape Mapping in Agricultural Supply Chains. USDA’s Natural Resources Conservation Service is also a potential partner; it maintains a considerable amount of environmental and geographical data at the farm level, including subsidy information that could be forwarded to ERS with the consent of the farmer. In Europe, several Farm Accountancy Data Networks (equivalent to ARMS) have experience with this type of data collection, which involves collaboration with accounting offices, agricultural accounting software, extension services, banks, and supply chain partners. Countries where such collaborations are operative today include Denmark, France, Germany, the Netherlands, and Norway. Their experiences might be inspirational for setting up a project in the United States that helps complex holdings share their data digitally with NASS and ERS instead of having to cope with piles of paper.

6.1. MOTIVATIONS FOR PURSUING ALTERNATIVE DATA SOURCES

Federal statistical agencies face increasing demands to improve the accuracy, granularity, and timeliness of their statistical products while simultaneously reducing programmatic expenditures. Accomplishing these goals requires optimizing the use of data already collected across the fed-

___________________

5 See http://www.aggateway.org.

6 See https://www.sustainabilityconsortium.org.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

eral statistical system, as well as incorporating new data products as they become available from nongovernment sources.

Administrative Data

International and national statistical agencies are increasingly using administrative data to support and supplement existing survey programs. Although definitions of administrative data vary across organizational bodies, all reference the same basic attributes. The Evidence-Based Policymaking Commission Act of 2016 defines administrative data as data that are “(1) held by an agency or contractor or grantee of an agency (including a State or unit of local government); and (2) collected for other than statistical purposes” (Commission on Evidence-Based Policymaking, 2017, p. 9). Unlike survey data collected specifically for statistical purposes, administrative data are typically collected in support of an agency’s or other organization’s routine program operations.7 Examples of administrative data include federal tax information, vital records, criminal justice records, and information on participants in a wide range of programs, such as unemployment insurance, Medicaid, Medicare, the Children’s Health Insurance Program, the Supplemental Nutrition Assistance Program, and federal student aid.

Administrative data are central to the ability of many government departments to fulfill their statutory responsibilities of program operation, management, evaluation, and oversight. In addition, statistical agencies often draw from administrative data to more efficiently meet their statutory obligations. For example, the Economic Census conducted by the U.S. Census Bureau has used a mail-out/mail-back design since 1905. It does not print and mail the entire survey instrument to every address in the United States, as that would be an incredibly wasteful exercise. First, most addresses are not associated with business operations, and thus far too many surveys would be produced. Second, industry-specific questions would apply to only a small share of the businesses that received the survey. Therefore, instead of doing a universal mail-out, the Census Bureau (Department of Commerce) uses information collected by the Internal Revenue Service (Department of the Treasury) through the latter’s administration of the income and payroll tax system to identify businesses, their

___________________

7 This definition of administrative data is similar in spirit to that provided by the United Nations: “information collected by public sector offices to meet demands of government regulations” (United Nations Economic Commission for Europe, 2011).

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

addresses, and their primary industry.8 With this information, a sample frame can be constructed that permits individual businesses to receive an Economic Census survey form that is specific to their own economic activity. This process reduces both the administrative cost of conducting the Economic Census and the businesses’ respondent burden.

In addition to supporting the development of sampling frames prior to survey data collection, administrative data can also support survey programs during data collection, for example to flag suspicious or missing records for follow-up investigation. They can also be used after data release for evaluating survey performance.9 Increasingly, however, statistical agencies are looking to expand the role that administrative data play in their survey programs, moving from supporting to supplementing, or in some cases replacing, their own survey-sourced data. This expansion is being driven in part by the urging of the Office of Management and Budget (OMB), with broad support from the scientific community.10 Reflecting these trends, the next section of this chapter (section 6.2) addresses the specific benefits and challenges of using administrative data to improve the accuracy and efficiency of agricultural statistics in the United States.

Other Nonsurvey Data Sources

Other nonsurvey sources of data, such as commercially produced data, are also becoming increasingly relevant in generating statistics on the economy. For example, experimentation is underway to tap into information collected by credit reporting agencies during the credit card application process concerning individuals’ debt and repayments. “Unstructured” data,

___________________

8 The Census of Agriculture adopted a mail-out/mail-back design beginning in 1959 and, prior to its transfer from the Census Bureau to NASS, the construction of an initial sample frame from IRS administrative records, including IRS Schedule F and IRS Form 943, was a key component of the planning process. Today, federal tax data from the IRS are still used to assist in the construction of the NASS list frame, but they are not directly incorporated.

9 A review of administrative data, and the scope, purpose, and principles and guidelines for their use, has been developed by Statistics Canada. See http://www.statcan.gc.ca/pub/12539-x/2009001/administrative-administratives-eng.htm.

10 From the report, Innovation in Federal Statistics: Combining Data Sources while Protecting Privacy: “OMB and the federal statistical agencies have engaged in a number of efforts in recent years to facilitate greater use of administrative records for statistical purposes, with the goal of improving federal statistics and facilitating program evaluation. . . . Statistical agencies have worked together to identify and document important case studies that demonstrate the utility of administrative data for statistical purposes and have documented difficulties in being able to access and use administrative data (see Prell et al., 2009). To address those difficulties, OMB issued a memo to all federal agencies that specifically encouraged the use of administrative data for statistical purposes and discussed the legal, policy, and operational issues with using administrative data (U.S. Office of Management and Budget, 2014a)” (National Academies of Sciences, Engineering, and Medicine, 2017b).

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

such as that continuously generated by social media or as a byproduct of Internet-based commercial activities, have been used to track prices,11 employment,12 subjective well-being,13 and for a wide range of other research purposes.

NASS is looking to expand its use of nonsurvey data in a number of areas. For example, it is exploring the use of Web-based information, from sources such as state and local permits, Facebook and Twitter feeds, and interest groups, in building lists to detect nontraditional entities such as urban farms.14 And, for the agency’s 2015 Local Food Marketing Practices Survey—which was designed to produce statistics on the number of farms that market food directly through farmers markets or roadside stands—a second list frame of potential local food operations was derived from Web-based information in an effort to increase coverage of the population. Although widely available, data generated for nonresearch purposes may suffer from incomplete coverage of the population or may be biased toward particular subgroups in the population. For example, they may be more readily available for less mobile or higher-income persons. On the positive side, such data often provide many more observations, greater detail, or greater timeliness than data from survey or administrative records.

The remainder of this chapter discusses the potential and limitations of administrative data, the current uses to which NASS and ERS put administrative data, and the importance of data linking for the use of these data.

6.2. BENEFITS AND CHALLENGES OF USING ADMINISTRATIVE DATA IN STATISTICAL PROGRAMS

The survey-centric statistical system created during the 20th century is at a crossroads. Despite technological advances in data collection and processing, producing accurate statistics from survey-based instruments has become increasingly cumbersome and costly. Of particular concern to the quality of social statistics is the well-documented decline in survey response rates which, as discussed in Chapter 2, increases the cost of data collection and can decrease the quality of the statistics produced from those

___________________

11 The most prominent effort is MIT’s Billion Prices Project, which uses Web-scraped data from online retailers to track inflation for 60 countries, see http://www.thebillionpricesproject.com.

12 See https://lsa.umich.edu/lsa/news-events/all-news/search-news/twitter--big-data-and-jobnumbers.html.

13 For example, a “happiness index” has been constructed by analyzing word usage in social medial. See https://www.facebook.com/notes/facebook/how-happy-are-we/150162112130.

14 See https://www.istat.it/storage/icas2016/f29-young.pdf.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

data.15 For example, decreasing response rates have forced NASS to report crop acreage, crop yield, and cash rents for a declining number of counties over time.16 New approaches are needed, not only to reverse the decline in response rates to the extent possible but also to collect data more efficiently from those farms that do participate in surveys.

A second issue complicating survey-based data collection, discussed at length throughout this report, is the rapidly changing structure of economic activity in key sectors of the U.S. economy. Production activities that were once almost exclusively undertaken by a central business entity are now commonly performed by multiple entities, including specialist providers. For example, instead of hiring and managing its own workers, a farm may outsource field preparation and planting to one operation; insecticide, herbicide, and fertilizer application to a second operation; and crop harvesting to a third. The characteristics of outputs produced by farm businesses are also diverse. Farms may process their crops into higher-value products that are sold directly to consumers (manufacturing and retail), and they may operate restaurants, catering facilities, or bed-and-breakfasts (food and lodging services). The boundary of the farm-firm is changing at the same time that the output portfolio of many operations is expanding. Identifying and correctly attributing the many inputs and outputs associated with production on a farm has become increasingly challenging for survey respondents.

A third issue affecting the quality of survey data is the limited budgetary support that federal statistical agencies receive to fulfill their statutory and regulatory obligations. Collecting and processing high-quality survey data to produce accurate and timely estimates of economic activity for use by policy makers, businesses, and citizens is a high-value but nonetheless expensive enterprise.

Administrative data, combined with other sources, offer partial solutions to these problems. Among the advantages offered by administrative sources is that, since their data are already collected as part of program operations, using them creates no additional costs of collection or added burden to the public; using tax data to maintain business frames is one example. At the same time, accessing administrative data may be associated

___________________

15 For a discussion of declining response to social science surveys, see National Research Council (2013).

16 To address this problem, NASS engaged the Committee on National Statistics of the National Academies of Sciences, Engineering, and Medicine to assess county-level crop and cash rent estimates and make recommendations on methods for integrating multiple data sources–including NASS surveys, data from other agencies, and automated field-level information collected by farm equipment dealers—to provide more precise county-level estimates of acreage and yield for major crops and cash rents by land use. See National Academies of Sciences, Engineering, and Medicine (2017c).

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

with costs, with meeting confidentiality and privacy requirements, and with making them fit for purpose.

Achieving efficiencies in the production of one statistic, through the use of administrative data or other nonsurvey sources, frees up resources for the production of other statistics. Administrative data may also be used to improve upon the quality of survey data by reducing variance of and bias due to nonsampling errors, increasing the timeliness of data releases, and facilitating small-area estimates. In some cases—such as when program participation is involved, where respondent recall is a problem, or where quantitative estimates are difficult to calculate—administrative records may be more accurate than survey responses. Even in the absence of falling response rates and budgetary pressures, using existing data more efficiently is still good practice.

CONCLUSION 6.1: As has been documented in numerous reports—most recently and prominently that of the Commission on Evidence-Based Policymaking (2017)—the use of administrative data can improve the overall efficiency of data programs by reducing agency expenditures, lowering respondent burden, encouraging the sharing of information across agencies, and potentially increasing the accuracy of the information collected. In some cases, administrative data may be used to replace survey data.

Among the disadvantages of using administrative data are the lack of researcher control over content and the need to overcome accessibility constraints.17

Using Administrative Data for Official Statistical Reporting

National statistical offices in several countries have been on the leading edge of administrative data use. At Statistics Canada, both survey enhancement and survey replacement are being actively explored. For example, the agency uses tax data to replace data on farm expenditures collected through the Canadian Census of Agriculture (Smith et al., 2013). The use of tax data to replace questions on detailed expenses in Statistics Canada’s 2016 Census of Agriculture was found to reduce respondent burden by allowing the length of the questionnaire to be reduced by approximately 7 percent. Furthermore, a Tax Replacement Study of Canada found that tax data were also reliable for estimating detailed farm expenses and acknowledged their potential to improve accuracy: “information prepared for tax data might

___________________

17 Administrative Data Liaison Service, see http://www.adls.ac.uk/adls-resources/guidance/introduction.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

be more thorough and complete” than information reported on the Census form (Smith et al., 2013).18

In the United States, both the Census Bureau and BLS use administrative data directly in the production of population estimates and in support of programmatic needs. As discussed in Chapter 5, the Census Bureau maintains the Business Register, which serves as the sample frame for its various survey instruments and is constructed from administrative tax records provided to it by the Internal Revenue Service (IRS). The data contained in the Business Register also serve as the primary inputs for the County Business Patterns (CBP) Program. The CBP Program publishes national-, state-, county-, and zip-code-level estimates of the number of nonfarm establishments with nonagricultural employees, the number of such employees employed on March 12, first-quarter payroll, and annual payroll.

The BLS operates an analogous sample frame and employment statistics program but relies on a different source of administrative records. The BLS Business List originates from unemployment insurance records provided by state and federal agencies. These are then linked to other administrative records to assign industry classifications. In addition to using these data as the sample frame for BLS survey programs, the BLS tabulates the underlying establishment, employment, and payroll data and publishes them through the Quarterly Census of Employment and Wages (QCEW) to provide industry-level estimates at various reporting geographies (national, state, county, and metropolitan statistical area, among them).19

It is worth noting that, while the CBP and QCEW rely primarily on administrative data to construct estimates of the number of operating establishments and employment and payroll numbers, survey data are critically important to both programs. First, the reporting unit for both the CBP and the QCEW is the establishment. To reduce reporting burden, firms with multiple establishments are not required to report payroll withholding and unemployment insurance contributions at the establishment level. Such firms are then requested by the appropriate agency to complete a separate survey that provides employment and payroll information for each establishment separately.20

Second, updated information about industry classification is largely collected through survey responses. For example, when a business begins

___________________

18 There are also instances where administrative data can be checked for accuracy using survey data. See Berent, Krosnick, and Lupia (2016) as well as Kreiner, Lassen, and Leth-Petersen (2013).

19 A major reason U.S. statistical agencies must produce two different business registers is that access to IRS data (in this case for BLS) is extremely limited due to laws about confidentiality.

20 The Census Bureau requests this information through the Company Organization Survey, and BLS requests this information through the Multiple Worksite Report.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

operation, it must register with the Social Security Administration and designate its primary economic activity. This is the source of information when firms enter the Business Register. The Census Bureau updates this information, however, if an establishment reports on a subsequent Census Bureau survey that its primary economic activity has changed. The BLS sends the Annual Refiling Survey to one-third of businesses on its register to provide updated industry classification information.

These two examples demonstrate the value that administrative records offer in producing statistical products for businesses. Because employment and payroll must be reported to various government agencies, processing these administrative records to construct estimates rather than conducting a survey saves valuable resources without increasing respondent burden. Nonetheless, these examples also demonstrate that administrative data often require complementary surveys to ensure that data quality and comparability are maintained.

Administrative data may also reduce respondent burden by decreasing the amount of information that a survey questionnaire must elicit. The Census Bureau is exploring the use of tax records to replace questions about income sources and amounts in household surveys such as the American Community Survey and the Current Population Survey. The results are promising. They suggest that administrative data can in some cases also yield significant improvements in accuracy over survey responses (National Academies of Sciences, Engineering, and Medicine, 2017a, p. 25).

Challenges to Using Administrative Data for Official Statistical Reporting

There are significant startup costs for the statistical agencies’ use of administrative data. It may take years to test, validate, and automate these sources to ensure their successful incorporation into statistical programs. The statistical agencies of USDA are no exception, as departmental policy dictates that significant vetting is required to understand administrative data sufficiently before integrating them into statistical programs.21 Since administrative data arrive at NASS with errors in coding, processing, and logic, “quality control efforts, such as time-series and cross-sectional analyses, two or more independent computational cross-checks, and record-level data evaluations, are undertaken before administrative data are used by NASS, particularly where principal economic indicator surveys or market sensitive releases employ administrative data.”22

___________________

21 USDA, Policy and Standards Memorandum, No. PSM-ASMS-15, April 2012. See http://nassportal/NASSdocs/Documents/PSM-ASMS-15.pdf.

22 Edwin Anderson and Daniel Beckler, Administrative Data Used by NASS, paper prepared for and presented to the panel (meeting #2).

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

There are other challenges as well to the use of alternative data sources. Realistically, these challenges ensure that at least for the very near term survey data will remain at the core of the statistical system. Among the challenges are the following:

  • Administrative objectives often differ from statistical objectives. Program agencies collect data for the primary purpose of administering their programs; as a result, these sources of information are typically not research-oriented, and data from survey and administrative origins may not be comparable. Stewards of administrative data make decisions that are optimal for their mission and rightfully seek to avoid negative side effects. For example, knowledge among program participants that their records will be used for research purposes may change either their willingness to report or the quality of their reports.
  • Inconsistent concepts and definitions. Administrative reporting/statistical units, as well as definitions of variables and populations, often do not match those used in surveys, especially for complex farm operations. Surveys often focus on the decision-making entities, while administrative units are typically concerned with smaller parts (such as the field level) of multi-entity operations. Farm Service Administration (FSA) data, for example, are a potentially useful source of data that are redundant with information in the ARMS, but they are reported using the FSA definition of a farm, which differs from the definition used by NASS and ERS. Ideally, to increase the versatility of these data, statistical and program agencies would collaborate on data collection—for example, to harmonize definitions across the agencies, or to add information in surveys, such as the last four digits of the respondent’s Social Security Number, to facilitate linkages.
  • Administrative data may reflect biased reporting. For example, there are incentives to underreport taxable income to reduce tax burden. In support programs, operators may seek to maximize the benefits of participation, which then influences the information they report on administrative forms.
  • Administrative data sets are often characterized by incomplete coverage of a population. For example, there may be selective participation in administrative programs.
  • Acquiring and documenting administrative data are often problematic. This is often due to legal constraints concerning confidentiality and privacy.
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

Due to these challenges, administrative data, while likely to prove increasingly integral to the statistical system, are not a panacea, and at least for the foreseeable future they cannot replace the need for surveys. The influential report by the Commission on Evidence-Based Policymaking (2017) recommends the creation of a federal agency responsible for overseeing the use of administrative records across federal agencies.23 This recommendation has not yet been implemented, and if it is it still may not have an impact on practice for several years to come.

6.3. CURRENT AND POTENTIAL FUTURE USE OF ADMINISTRATIVE DATA BY USDA

Administrative data are used by USDA for a range of purposes, including survey planning and design, frame construction and stratification, and assessing selection probabilities.24 This section reviews current sources of administrative data and suggests how usage could be expanded in the future.

Farm Services Administration (FSA)

One particularly rich source of information currently accessible across statistical programs at USDA is the administrative data set maintained by the FSA to implement farm conservation and regulatory laws. FSA data are drawn to estimate production and price statistics for various crop, dairy, poultry, and livestock programs. Of particular importance are the FSA 578 data, which NASS uses to estimate minimum planted acreage indications, calculated by summing acreage for planted, failed, and other status codes, such as double cropping.

FSA data on acreage mixes self-reported data with “determined” data, that is, values taken from satellite images or other sources; and FSA and NASS use different coding schemes for crops. Due to these kinds of differences, NASS reports that “approximately 70 percent of FSA names map to

___________________

23 Congress passed the Evidence-Based Policymaking Commission Act of 2016, which created an expert panel—the Commission on Evidence-Based Policymaking—to conduct a comprehensive study recommending strategies for making administrative and other nonsurvey data available for research and policy purposes, while ensuring individual privacy and confidentiality.

24 In discussing administrative data, it is important to distinguish between administrative data housed at USDA (e.g., subsidy and insurance programs; conservation programs), administrative data at other agencies (most notably, IRS and BLS), and survey data from other agencies (e.g., demographic data at the Census Bureau, and employment/wage data at BLS). As discussed in Section 6.4, using any of these requires “linkage” with NASS farm and person identifiers.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

NASS records with few complications . . . the remaining 30 percent require probabilistic record linkage techniques to associate possible matches” (Anderson, 2017, p. 6). The final matches are then manually completed.

Improved linkage accuracy has the potential to greatly improve the utility of these data. In particular, using administrative data to link farm owners to FSA farm numbers could allow for a reasonable method to generate aggregate estimates of key farm economy indicators. This would be useful in its own right, but it would also allow for benchmarking of the Census of Agriculture.

Hurdles exist to the creation of these linkages. ERS research indicates that some data on farm operations are clearly missing and that incorporating additional data (such as crop insurance policies) causes sample selection bias because the added data depend on voluntary participation. In addition, networks of operations or owners captured in administrative data are not stable over time, in part because land moves between them.25 Nonetheless, these challenges may be no more daunting or deleterious than nonresponse to the Census of Agriculture.

Federal Tax Information (FTI)

The use of federal tax information to conduct surveys of the farm economy has changed markedly over time. The Census of Agriculture adopted a mail-out/mail-back design beginning in 1959 and, prior to its transfer from the Census Bureau to NASS, the construction of the mailing list began with extensive use of IRS administrative records. Box 6.1 below, taken from the 1992 Census of Agriculture documentation, summarizes the initial sources of potential survey recipients before the linkage and validation process.

Today, while federal tax data from the IRS are still used to assist in the construction of the NASS list frame, they are not directly incorporated into the frame. Instead, NASS uses it to contact tax filers with agriculture activities not already present in the NASS list frame. As detailed in Anderson and Beckler (2017), usage of these data is governed by three principles: (i) the limited number of NASS staff allowed direct access to actual FTI data; (ii) a promise of restricting FTI data within a limited-access secure area; and (iii) a promise that FTI data will never be shared with anyone.

FTI is obviously highly sensitive, and the public desires assurances that such data will remain confidential after it is collected. As a result, there are numerous federal laws governing the appropriate use of such data, even for statistical purposes, and the penalties associated with its misuse are severe. Nevertheless, both USDA and the Department of Commerce have a statu-

___________________

25 Presentation to the panel, meeting no. 2, by Steve Wallender.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

tory right to request FTI from the Department of Treasury for the purposes of conducting the Census of Agriculture and the Economic Census.26

___________________

26 The statutes are as follows:

(A) such returns, or return information reflected thereon, to officers and employees of the Bureau of the Census, and (B) such return information reflected on returns of corporations to officers and employees of the Bureau of Economic Analysis, as the Secretary may prescribe by regulation for the purpose of, but only to the extent necessary in, the structuring of censuses and national economic accounts and conducting related statistical activities authorized by law.

26 USC 6103(j)(5): Upon request in writing by the Secretary of Agriculture, the Secretary shall furnish such returns, or return information reflected thereon, as the Secretary may pre-

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

Use of FTI for statistical purposes is not available to all federal statistical agencies. Thus, while the Census Bureau can produce estimates of payroll and employment based on payroll tax filings through the CBP Program, an analogous program undertaken by USDA would not be feasible under current law.

Making better use of FTI to improve list-frame construction and reduce the reporting burden should be a priority within USDA, but it would require modification of federal statutes to extend beyond currently allowed use, which is limited to that required in “conducting the Census of Agriculture.” In the statute’s language, it is a matter of interpretation whether FTI can be used as an alternative source of household and farm income information within the Census of Agriculture. Nonetheless, ongoing dialogue has the potential to achieve greater collaboration on statistical reporting programs that leverage the statutory authority available to the Census Bureau.

USDA Conservation Programs

Conservation programs also generate administrative data that are useful for statistical purposes. For example, record-level information is available, disaggregated at a scale near the farm field level, since conservation contracts generally apply to individual fields or small collections of fields. Many commodity payments and risk management programs also apply at close to the field level.27 However, the use of administrative data with common land unit information is highly restricted due to privacy concerns and language in the Farm Act. Currently, linking by researchers must be done on-site at ERS by the agency’s Geospatial Information System team.

Summary

USDA currently uses administrative data for statistical and other purposes. However, there is even greater scope for their use within USDA’s statistical reporting programs—to facilitate the construction of sample frames, validate data collected from survey instruments, augment existing collection efforts to handle nonresponse or missing information, and contribute to data processing through model-assisted calibration, model-based estimation, and imputation of survey responses. Since administrative data are maintained to support many USDA programs, the scope of these poten-

___________________

scribe by regulation to officers and employees of the Department of Agriculture whose official duties require access to such returns or information for the purpose of, but only to the extent necessary in, structuring, preparing, and conducting the Census of Agriculture pursuant to the Census of Agriculture Act of 1997 (Public Law 105–113).

27 Presentation to the panel, meeting no. 2, by Cynthia Nickerson and Steve Wallender, both of ERS.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

tial applications is vast. A previous report commissioned by USDA and conducted by the Committee on National Statistics (CNSTAT) (National Research Council, 2008) recognized this potential:

NASS and ERS should explore the collection of auxiliary information on a formal basis, as well as feasibility of enriching the ARMS data files with information from administrative data sources, geospatial data, and the like. (p. 162)

NASS and ERS have demonstrated a willingness and capability to successfully expand their usage of administrative data. Following the CNSTAT report quoted above, NASS and ERS responded by participating in an OMB-led initiative to incorporate selected administrative data into surveys. Part of this involved an effort by USDA to synchronize the reporting of administrative (program) data for FSA, the Risk Management Agency, and the Natural Resources Conservation Service in a way that promoted the use of common definitions and reporting. The purpose of this effort was to allow for more direct linking to ARMS and other survey records with the goal of developing agricultural production and conservation.

These efforts have led ERS to conclude that it is possible to use administrative data to support research into complex farm operations. Two key challenges to achieving this goal remain, however. First, as noted earlier, it requires harmonizing the definition of a farm operation sufficiently across the NASS surveys and these sources of administrative data, so that the quality of data linkages can be established. Second, complex farm structures may provide information at a level of aggregation that does not match the desired level of reporting. We have addressed aspects of these challenges in previous chapters of this report. In the next section, we directly address the challenges of data linkage with administrative data sources.

6.4. THE ESSENTIAL ROLE OF DATA LINKING

When adopted for research or statistical purposes, the value added from administrative and other nonsurvey data is often realized when they can be combined with survey data. Record-level survey data may be augmented with information—on income, demographics, geolocation of residence or business, program participation, employment, and potentially many other variables—from administrative records or other sources. Administrative sources often contain data useful for creating descriptive estimates, such as on levels of program support. Other sources may provide supplemental, contextual information about counties and states for subnational-level analyses. For example, ERS has done some exploratory work using county land-value records, by parcel. Many administrative data sources serve needs

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

at the state and local levels of government, which often are the geographical units to which agriculture policies and programs are most relevant.

In applications where federal surveys generate insufficient sample sizes to support local-level estimates, linking to additional data sources may reduce the variance of estimates and feed into small-area modeling.28 Although nonsurvey data are rarely sufficient on their own to support analyses of farm policy or to evaluate program impacts, they are becoming increasingly essential for filling in key pieces of information.

CONCLUSION 6.2: The effectiveness of the federal statistical system to meet future data demands will largely depend on the extent to which data sources—survey and nonsurvey, national and local, public and private—can be combined in synergistic ways. Any redesigns of the Census of Agriculture and Agricultural Resource Management Survey should be done with the presumption that these instruments will need to be linked to other data sources maintained by the U.S. Department of Agriculture, other statistical agencies, and even nongovernment organizations.29

The key element in the data system for promoting data linkages—for example, between household records and farm business records—is created during questionnaire design. If units of measurement are consistent, then in principle a crosswalk between the FSA data and NASS survey IDs can be maintained. Other options include asking respondents for a limited amount of personal identifying information, such as the last four digits of the Social Security Number, FSA field IDs associated with the operation, or date and place of birth.30 However, adding questions like these in order to improve

___________________

28 Small-area estimation methods include generalized linear mixed models (e.g., Fay and Herriot, 1979) and hierarchical models (e.g., Lindley and Smith, 1972). The National Academies of Sciences, Engineering, and Medicine (2017a) report on the use in federal statistics of multiple data sources provides a full discussion of linking methods and of the potential benefits of using administrative data for statistical purposes: as a complete frame or supplement to an existing frame for individuals, households, or businesses; to replace surveys when the administrative data contains all needed information; for editing survey responses or making imputations for missing responses; as a source of auxiliary information that can be used to improve survey-based estimates; and for survey evaluation (e.g., to compare the number of program beneficiaries in program records with estimates based on a survey).

29 A National Academies of Sciences, Engineering, and Medicine (2017b) report initiated to examine the potential of combining data sources for research and policy purposes, discusses the methodological considerations for designing surveys with administrative data linking in mind.

30 One hurdle to attaching Social Security or Employer Identification Number to a farm establishment or business is that respondents may be reluctant to provide it. Getting the same information from the IRS is fraught with difficulties. NASS sought to avoid this situation by not using IRS information directly in its farm list. NASS does receive some tax information from farm filers and links it to commodity and other lists that the agency receives. For matching records, NASS uses the information from these other lists, so that it is not incorporating any IRS data directly into the NASS farm list. This caution notwithstanding, it should be noted that other National Academies panels (e.g., National Research Council, 2007; National Academies of Sciences, Engineering, and Medicine, 2017b) have recognized the advantages—in terms of both response burden and improved quality of statistics—of improving the ability of federal agencies to share data, including tax data. There are certainly practical issues with using these data, but sharing across agencies is a method that has the potential to reduce survey burden, even if the ability to use these data relies on overcoming some legal hurdles and on the uptake of recommendations aimed at all the statistical agencies.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

linkage rates would require extensive testing; for example, simultaneous impacts on nonresponse would need to be carefully evaluated.

New Opportunities for Data Linking within USDA

Based on experimental research to determine whether discrepancies across sources are meaningful, linkages across USDA data are already reported to be working well, particularly across the Census of Agriculture, ARMS, and the June Area Survey (Young, Lamas, and Abreu, 2017). Data linking efforts are progressing along a number of fronts. Some efforts take advantage of geo-referenced common land units that, by serving as a basic unit for geographically based list frames, allow linkage to FSA and Risk Management Agency administrative records and also to GIS data generated by remote sensing and precision agriculture. A recent expert panel on Methods for Integrating Multiple Data Sources to Improve Crop Estimates recommended in its report that NASS adopt FSA’s “common land unit” as its basic spatial unit (National Academies of Sciences, Engineering, and Medicine 2017c, Recommendation 2-8).

An ERS project on the northern plains is exploring the capacity for linking geospatial data on soils and cropping history, data that are increasingly available at the field level. The purpose of the project is to study what happens to conservation tillage on fields following participation in a conservation contract or on neighboring fields that are not in such a contract. The geospatial data add key variables to field-level administrative data that can be exploited by researchers. ERS is currently working with the Peterson Institute and USDA’s Agricultural Research Service to develop satellite-based, field-level estimates of conservation tillage and then link these data to administrative and survey data. More generally, small-area estimates, such as for yields or acreage devoted to a particular crop, could be made more accurate and comprehensive by combining survey information based only on a subset of farms with area-level satellite imagery, which may be available for all areas (Cruze, 2015; National Academies of Sciences, Engineering, and Medicine, 2017a).

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

Linking data sources often proves to be challenging in practice. One reason is that, as described above, even among USDA agencies the differences in the definition of the farm and other measurement units create inconsistencies in the way data are reported to NASS, FSA, and the Risk Management Agency. The Panel on Integrating Multiple Data Sources to Improve Crop Estimates reported that “a project in support of the 2012 Census of Agriculture to link the FSA payments database and the NASS list frame resulted in only a 63 percent match rate with 6 percent possible matches” (National Academies of Sciences, Engineering, and Medicine, 2017c, p. 38). An NASS study of Nebraska showed that there were 2.4 FSA-identified farms for every NASS-identified farm in the state, and that difficulties in aligning the NASS farms and FSA farms were most acute in the case of complex operations, where there is ambiguity in defining the reporting unit. The available geospatial information on farm operations can help analysts understand differences in the list frames by “making it possible to track down the NASS farms to identify matches.” Expensive manual efforts to match farms, according to the same panel report, would best be directed toward achieving matches for the largest farms (National Academies of Sciences, Engineering, and Medicine, 2017c).

New Opportunities for Data Linking: Survey and Administrative Data from Other Federal Agencies

Taking full advantage of multiple data sources to transform statistical programs at ERS and NASS will require coordination that extends beyond USDA to include data sources housed at other statistical agencies and beyond. The decentralized U.S. statistical system creates complications that international counterparts can largely avoid. An example is Statistics Canada which, as a centralized statistical agency, has broad powers to exploit administrative data. Across the U.S. system, each statistical agency has its own set of approval, confidentiality, and clearance procedures.31 No doubt related to these systemic contrasts, along with the differing laws that govern interaction and collaboration between agencies, national statistical offices in Europe report that of all the information they collect, the

___________________

31 Prior to 2002, the legislative authority for maintaining the confidentiality of identifiable information collected for statistical purposes was not uniform across statistical agencies. In 2002, the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) was enacted to (i) provide a uniform standard of privacy and confidentiality for statistical agencies to ensure that information supplied by individuals or organizations to an agency under a pledge of confidentiality is used exclusively for statistical purposes, and will not have that information disclosed in identifiable form to anyone not authorized in the legislation; and (ii) promote statistical efficiency through limited sharing of business data among three designated statistical agencies: the Census Bureau, the Bureau of Economic Analysis, and the BLS.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

proportion that originates with administrative data is roughly 80 percent, as compared with 20 percent originating from surveys. Meanwhile, the ratio for the United States is just the reverse (20 percent versus 80 percent) (Prewitt, 2010). This relatively modest baseline means that the potential for exploiting administrative data for U.S. agencies is relatively large. Indeed, U.S. agencies are now moving more quickly to create opportunities for these kinds of coordinated efforts.

The U.S. Census Bureau, in particular, has already cultivated a significant capacity to link data from a range of sources, a capacity that is being advanced by researchers using data available on a restricted-access basis through its Federal Statistical Research Data Centers (FSRDCs). Titles 13 and 26 of the U.S. Code, which govern much of what is possible at the FSRDCs, provide guidelines for protecting and accessing high-value information about the nation’s population and economy. These laws convey the rules for accessing and utilizing records to the greatest extent possible for statistical uses; for supporting reimbursable studies and joint statistical projects; and for protecting confidential individual and establishment data.32 Section 23(c) of Title 13 allows researchers to be sworn in to access federal data, including data housed and secured by the IRS, the Social Security Administration, and the departments of Housing and Urban Development and Veterans Affairs. Some state data and third-party data are also accessible from within these centers.

Research based on confidential data, whether using administrative or survey data, is only approvable if it supports the mission of the Census Bureau by contributing to improved data quality or the estimation of population characteristics.33 But the interpretation of mission, and in turn the approvable scope of research, have each been broadened under the FSRDC program, which has established partnerships between federal statistical agencies and leading research institutions; it now serves as a data host in enclaves for other agencies as well.

Through its Center for Administrative Records Research and Applications (CARRA), the Census Bureau has developed infrastructure to help other statistical agencies move forward on record linkage so a broader range of data sources can used for statistical reporting. This has included developing expertise in combining data and in meeting legal requirements and hiring the personnel to write data-use agreements and the staff needed

___________________

32 Title 13 governs the Census Bureau directly, but Title 26 (governing IRS) includes provisions for collaborating with the Census Bureau on statistical reporting. Specifics about these laws can be found at https://www.census.gov/history/www/reference/privacy_confidentiality/title_13_us_code.html, and https://www.census.gov/history/www/reference/privacy_confidentiality/title_26_us_code_1.html.

33 The role of the “Predominant Purpose Statement” is described at https://www.census.gov/ces/pdf/Research_Proposal_Guidelines.pdf.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

to curate the data. CARRA identifies, acquires, processes, links, curates, and analyzes administrative data, and it creates products that demonstrate the value of data linkage and linked data. It now has many years of experience in identifying administrative sources and figuring out, based on precedent, how to tackle the governance and legal issues that throw up hurdles to their use. It also strives to promote a sustainable and scalable model for accessing a range of high-value, sensitive, and confidential information.

Providing further proof of the multiple-data-source concept, the Census Bureau is currently engaged in a number of joint projects geared toward maximizing the value of existing surveys.34 Additionally, CARRA is engaged in a longitudinal linkage project with 10 institutions in seven FSRDCs (the Census Longitudinal Infrastructure Project, or CLIP), and 12 pilot projects at Chapin Hall, University of Chicago. The pilot projects are on topics that range widely, from labor market outcomes for public school students in Chicago, to causes of poverty in Cook County, to service utilization by families and children experiencing homelessness, and they are oriented toward using linked data to enhance evidence-based policy at the local, state, and federal levels.35

Growth in the FSRDC system has been enormous during the past five years. In 2010, there were 12 research data center locations, and there are now 28 locations with more on the way.36 If USDA were fully partnered into the FSRDC program, its research and statistical capacity could be greatly enhanced.37 For example, person-level data from ARMS and the Census of Agriculture could be linked to BLS employment data, and possibly to tax data, to generate fuller profiles of farm operation entities. The free labor of academic researchers granted access to FSRDC centers could help further develop analytic tools to answer questions about the income of farm households, about sub-businesses of farm households, about on-farm and off-farm value-added activities, and about many other dimensions

___________________

34 These joint projects are collaborations with the Bureau of Justice Statistics, Bureau of Prisons, Centers for Medicare & Medicaid Services, IRS, Social Security Administration, and departments of Veterans Affairs and Housing and Urban Development, as well as USDA’s ERS. Among the surveys identified by Amy O’Hara (in a presentation to the panel on February 10, 2017) that link to administrative or other survey data sources are the Rental Housing Finance Survey, American Housing Survey, the Consumer Expenditures Survey, the National Survey of College Graduates, the American Community Survey, and the Survey of Business Owners.

35 The University of Chicago, Chapin Hall Webpage documents these programs, see http://www.chapinhall.org/pages/RFP-Linked-Data-Evidence-Based-Policymaking.

36 See https://www.census.gov/about/adrm/fsrdc/locations.html.

37 It should be noted that use of research data centers for NASS/ERS is not a new idea. A 2008 CNSTAT report (National Research Council, 2008) recommends that “USDA should consider extending the availability of ARMS microdata through the Census Bureau research data centers to increase access opportunities for using additional data sets and enabling researchers to match ARMS files with other data sets” (p. 157).

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

of complex operations. Jumping into the FSRDC “sandbox” would allow researchers to learn more about ARMS respondents who also have records of their farm operations in various business datasets.

RECOMMENDATION 6.1: The U.S. Department of Agriculture should explore opportunities for record linkage at the person level to obtain information on key demographic and off-farm employment variables, and perhaps with the Internal Revenue Service on farm income and expense information. These opportunities can be explored through participation in the Federal Statistical Research Data Centers program, a partnership between federal statistical agencies and leading research institutions that provides secure access to restricted-use microdata for statistical purposes.

NASS and ERS have already developed a data access mechanism in which ARMS data are accessible for statistical purposes through a cooperative agreement with NORC at the University of Chicago. This agreement suits the needs of the agencies and their researchers and is governed by rules established by the Confidential Information Protection and Statistical Efficiency Act. The arrangement works well for those who want to work with ARMS data alone, but NASS does not provide certification of analyses through the National Opinion Research Center, and the center does not provide opportunities for linking with data from other agencies. Any proposed expansion in the use of tax data should be accompanied by research to assess producer sentiment toward the idea as well as a campaign to educate producers on why this would be beneficial to them (for example, in reducing burden).

RECOMMENDATION 6.2: The National Agricultural Statistics Service should pilot efforts to participate in the Federal Statistical Research Data Centers program and identify one or more high-value projects through which U.S. Department of Agriculture researchers could engage with academic researchers and Census Bureau staff.

Reflecting the growing importance of drawing from multiple sources of data for policy-relevant program evaluation and research, the report of the Commission on Evidence-Based Policymaking (2017) recommended creating a secure digital portal for researchers to use to study the impact of U.S. government spending on health care, education, housing, labor markets, and other sectors of the economy. If this recommendation were legislatively enacted, the portal, which the commission refers to as the National Secure Data Service, would mark the next step in the evolution of the Federal Statistical Research Data Centers. The report recommends housing this

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

National Secure Data Service within the Department of Commerce, where the Census Bureau (and, crucially, CARRA) is housed, but with assistance from the 12 other key statistical agencies scattered across the government. The National Secure Data Service would be organized to “temporarily link existing data and provide secure access to those data for exclusively statistical purposes in connection with approved projects” (Commission on Evidence-Based Policymaking, 2017, p. 1). It would build on the infrastructure and expertise already developed at the Census Bureau’s CARRA to ensure that data linkages and access to confidential data for statistical purposes are conducted in the most secure manner possible.38 Additional state-collected data about federal programs would also be made available for statistical purposes: “Where appropriate, states that administer programs with substantial Federal investment should in return provide the data necessary for evidence building” (Commission on Evidence-Based Policymaking, 2017, p. 2).

CONCLUSION 6.3: Given the work of the Commission on Evidence-Based Policymaking to improve the climate for legislative changes that would make data linking more routine across the statistical agencies, now is the time for the National Agricultural Statistics Service and Economic Research Service to begin mapping out a strategy to coordinate their survey and administrative data programs within the U.S. Department of Agriculture and across other key agencies such as the Census Bureau and the Bureau of Labor Statistics.

One example of how combining data collected across agencies creates new opportunities is in the reporting of off-farm food and agricultural activities, a key data need discussed at several points in this report. To produce statistics on agriculture and the food chain more broadly, as opposed to just on-farm economic activities, the role of data from nonagricultural agencies is crucial. Coordinating output and employment data on farming with data on manufacturing and services could help fill the gaps in our understanding of businesses that operate in close proximity to farm businesses but are not picked up in the Census of Agriculture.

The Bureau of Economic Analysis (BEA) currently reports value-added, employment, gross domestic product contributions, and other statistics for six supplemental (or “satellite”) accounts. These accounts combine

___________________

38 The commission’s report details promising approaches to quantify the additional risks to privacy associated with record linkages and then set boundaries on acceptable levels of privacy loss. Several methods, such as differential privacy and the use of synthetic data that substitute for values in the original data, are already in use at the Census Bureau as well as at private sector companies, such as Google and Uber.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

TABLE 6.1 Bureau of Economic Analysis (BEA) Categories and Codes for Food and Agriculture, with Corresponding North American Industry Classification System (NAICS) Codes

BEA Code Title 6-digit NAICS Codes
11 Agriculture, forestry, fishing, and hunting
111CA Farms 11
113FF Forestry, fishing, and related activities 3
31G Manufacturing
333 Machinery 1
325 Chemical products 2
311FT Food/beverage/tobacco products 28
44RT Retail
445 Food and beverage stores 1
7 Service
722 Food services and drinking places 3

SOURCE: See https://www.bea.gov/industry/io_annual.htm.

economic activity across North American Industry Classification System (NAICS) categories to provide aggregate reporting on sectors defined by criteria different from those used for NAICS.39 Although it does not have a supplemental account for food and agricultural industries, and it has problems accurately reflecting agricultural sectors with NAICS data, BEA reports on the categories listed in Table 6.1 in its input-output (IO) tables which underlie the national income product accounts. The first column contains relevant BEA codes from the IO tables representing the national economy with 15 aggregate industries, while the second column contains relevant codes from the 71-industry IO tables. The final column contains the number of relevant 6-digit NAICS codes from the IO tables used to represent the economy in its most disaggregated form (for National Income Products Account reporting), which includes 389 industries. ERS currently uses data from BEA IO tables for its “food dollar” series.40 Code 311FT captures many of the value-added activities that occur downstream from farms in food preparation, but there are significant components of the nonfarm agricultural sector not included in current BEA industry aggregations.

Table 6.2 contains a subset of potential additional NAICS categories that could be part of a supplemental account for more comprehensive reporting on food and agricultural industries.

___________________

39 See https://www.bea.gov/index.htm.

40 See https://www.ers.usda.gov/data-products/food-dollar-series.aspx.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

TABLE 6.2 Potential (proposed) Additional North American Industry Classification System (NAICS) Categories for a Supplemental Account

NAICS Description
237990 Farm drainage tile installation
423320 Lime (except agricultural) merchant wholesalers
423820 Farm machinery and equipment merchant wholesalers
424590 Raw farm products (except field beans, grains) merchant wholesalers
424910 Chemicals, agricultural, merchant wholesalers
424910 Farm supplies merchant wholesalers
424910 Lime, agricultural, merchant wholesalers
424910 Pesticides, agricultural, merchant wholesalers
444220 Farm supply stores
484220 Farm products hauling, local
484230 Farm products trucking, long-distance
493120 Farm product warehousing and storage, refrigerated
493130 Bonded warehousing, farm products (except refrigerated)
493130 Farm product warehousing and storage (except refrigerated)
493190 Warehousing (except farm products, general merchandise, refrigerated)
522292 Farm mortgage lending
522294 Federal agricultural mortgage corporation
532490 Farm equipment rental or leasing
532490 Farm tractor rental or leasing
541711 Biotechnology research and development laboratories or services in agriculture
561710 Pest control (except agricultural, forestry) services
811310 Farm machinery and equipment repair and maintenance services
811310 Tractor, farm or construction equipment repair and maintenance services

Summary

In summary, administrative data have the potential to improve the efficiency of survey programs and the accuracy of statistical estimates derived from them. Challenges in the use of administrative data arise for several reasons. First, data collected for programmatic purposes and data collected from surveys are pursued with different objectives, so they are not always optimal for linking for the purpose of improving research and evidence-based policy. Second, the decentralized nature of the U.S. statistical system

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

creates legal and administrative barriers to efficient cross-agency collaboration. However, recent work at the Census Bureau by CARRA, coupled with developments such as the recommendations of the Commission on Evidence-Based Policymaking (2017), have greatly increased the chance of overcoming these hurdles. The latter recommendations, especially, hold the promise of motivating legislation to push mechanisms forward for broader data sharing and linkage across the nation’s statistical agencies.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×

This page intentionally left blank.

Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 143
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 144
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 145
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 146
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 147
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 148
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 149
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 150
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 151
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 152
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 153
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 154
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 155
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 156
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 157
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 158
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 159
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 160
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 161
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 162
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 163
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 164
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 165
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 166
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 167
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 168
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 169
Suggested Citation:"6 A Broader Data Infrastructure: Administrative and Other Nonsurvey Data Sources." National Academies of Sciences, Engineering, and Medicine. 2019. Improving Data Collection and Measurement of Complex Farms. Washington, DC: The National Academies Press. doi: 10.17226/25260.
×
Page 170
Next: Bibliography »
Improving Data Collection and Measurement of Complex Farms Get This Book
×
Buy Paperback | $55.00 Buy Ebook | $44.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

America’s farms and farmers are integral to the U.S. economy and, more broadly, to the nation’s social and cultural fabric. A healthy agricultural sector helps ensure a safe and reliable food supply, improves energy security, and contributes to employment and economic development, traditionally in small towns and rural areas where farming serves as a nexus for related sectors from farm machinery manufacturing to food processing. The agricultural sector also plays a role in the nation’s overall economic growth by providing crucial raw inputs for the production of a wide range of goods and services, including many that generate substantial export value.

If the agricultural sector is to be accurately understood and the policies that affect its functioning are to remain well informed, the statistical system’s data collection programs must be periodically revisited to ensure they are keeping up with current realities. This report reviews current information and makes recommendations to the U.S. Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) and Economic Research Service (ERS) to help identify effective methods for collecting data and reporting information about American agriculture, given increased complexity and other changes in farm business structure in recent decades.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!