3
The Quarterly Hog Inventory Survey
Emilola Abayomi from the Research and Development Division of the National Agricultural Statistics Service (NASS) described the quarterly Hog Inventory Survey, the source of key inputs to NASS official estimates of hog inventories, with details about the sample design, an overview of the survey process, and a description of the implications of the Porcine Epidemic Diarrhea virus (PEDv) as a major shock to the system. The session was moderated by Ron Plain (University of Missouri). The presentation was followed by questions and answers from the audience.
THE HOG INVENTORY SURVEY
Abayomi reported that the Hog Inventory Survey is administered quarterly. December is the base quarter, when the greatest number of operations are asked to report. Follow-on surveys are administered in March, June, and September. The target population for the survey is all agricultural operations that own one or more hogs or pigs. The primary estimates derived from the survey include total inventory, breeding herd, market inventory by weight class (<50 lbs., 50–119 lbs., 120–179 lbs., and 180+ lbs.), sows farrowed, and pig crop (the latter two reported monthly for the previous quarter).
The first step of any survey is developing a frame that will cover the target population, she said. NASS constructs and maintains a list frame of all known agricultural operations. Data such as contact information, demographic information, type of agricultural entity, and variables useful
TABLE 3-1 Stratum Design for Iowa
Stratum | Number of Hogs and Pigs | Sampling Weight |
---|---|---|
80 | 1–99 | 24.0 |
82 | 100–999 | 2.19 |
86 | 1,000–9,999 | 1.53 |
88 | 10,000–29,000 | 1.00 |
90 | 30,000–49,000 | 1.00 |
92 | 50,000–89,999 | 1.00 |
98 | 90,000+ | 1.00 |
SOURCE: Prepared by Emilola Abayomi for presentation at the workshop.
for developing agricultural surveys are maintained on the frame. Regular maintenance and list building are needed to ensure that the frame remains current and is representative of all U.S. agriculture. The list frame for the Hog Survey consists of operations on the NASS list frame that are identified as having hogs and pigs. This list frame has been estimated to cover 97 percent of all hog inventory.
She explained that states are divided into three tiers for sampling. The first tier consists of the 16 top hog-producing states. A sample of producers is selected to report for these states quarterly, and state-level estimates are published quarterly. Most of these states have a target coefficient of variation (CV) of 6 percent. However, seven critical states1 have a target CV of 3 percent. Fourteen states in the second tier have a substantial amount of hog inventory. They are sampled every quarter; however, their state estimates are only published in December. They also have a target CV of 6 percent. The remaining states are only sampled in December, and their state estimates are published in December. They have a combined target CV of 6 percent. A stratified random sample is designed for each state. The strata are categorized by the total number of hogs and pigs that are owned by an operation as recorded on the list frame. Tables 3-1 and 3-2 show the stratum limits for Iowa, the top-producing hog state, and Colorado, a top-producing state but not one of the top seven. A random sample of operations is drawn from each stratum in a state. Stratum 98 is referred to as the extreme operators. They
___________________
1 These states are Illinois, Indiana, Iowa, Minnesota, Missouri, Nebraska, and North Carolina.
TABLE 3-2 Stratum Design for Colorado
Stratum | Number of Hogs and Pigs | Sampling Weight |
---|---|---|
80 | 1–99 | 31.92 |
82 | 100–499 | 1.00 |
98 | 500+ | 1.00 |
SOURCE: Prepared by Emilola Abayomi for presentation at the workshop.
always have a sampling weight of 1.00 (but other strata may also have sampling weights of 1.00).
Abayomi explained that the June Area Survey implements an area survey to provide a measure of undercoverage for the NASS list frame. It is completely independent of the list and, by definition, is complete. This area frame is used to determine and provide the adjustment of 3 percent undercoverage of the list. In June, the area frame sample is selected, farms are identified, and area frame records are linked to the list. Any of the area frame farms that are not on the list are called NOL (not on list) records. All of the NOL records that were identified as having hog inventory are included in the Hog Inventory Survey during December2 as part of the area frame sample. For March, June, and September, the data from NOL records are modeled.
She went on to say that once the sample is selected, the data are collected. As noted in Chapter 2, the survey timeline is fairly condensed. There are 15 days of data collection, starting on the reference date (December 1, March 1, and so on), followed by 4 to 5 days of editing, imputation, and interpretation of results. There are 5 to 6 days of review and preparation of official estimates. During the last week of the month, the information is released to the public.
All modes of data collection are utilized for this survey, she said. Mail and web are ideal due to data collection costs. However, telephone follow-ups help collect data, and some personal interviews are conducted with larger operations who require special accommodations.
All data go through a process of review and adjustment to ensure consistency and quality control, she said. Regardless of the data collection mode, all the data go through an interactive system called Blaise (a com-
___________________
2 To maintain the independence of the list frame and the area frame, the NOL records are not added to the list frame.
puter software editing system). Blaise uses logical edits to ensure that the relationship between responses are consistent. Each record (response) that fails this edit will be marked as “dirty” and subject for additional review. “Dirty” records are reviewed by a statistician to try to determine whether the information provided was accurate or adjustments will be made. Revised records are evaluated through Blaise. Once all data are deemed clean, they move to the data analysis phase. During data analysis, the statistician reviews outliers and influential records in comparison to current and past data. Once data analysis is completed, all data are deemed clean and the process moves to a summary stage.
DISEASE SPREAD
Abayomi described shocks, followed by challenges for modeling given the way disease spreads and can be reported in the data.
She defined a shock as any event that can cause a sudden change in hog inventories, noting that a shock can take various forms. Examples include natural disasters such as Hurricane Florence (North Carolina) and Hurricane Michael (the Carolinas, Florida, Georgia) in 2018; and the flooding in 2019 in Nebraska and Iowa.
A shock can also take the form of a disease, such as PEDv. As described in Chapter 2, the disease, first detected in the United States in 2013, kills young pigs. Disease spread is another challenge in looking at the impact of shocks. PEDv illustrates disease spread. A network of Animal and Plant Health Inspection Service (APHIS) laboratories began to collect information about this virus in 2013 and shared the information among themselves. Eventually, they voluntarily gave this information to APHIS headquarters, which began to produce weekly reports on PEDv accessions and the number of positive PEDv samples that were identified. Abayomi showed that in July 2013, PEDv was detected in 9 states. As of November 2013, 17 states had positive detections of the virus. The virus had gained momentum and began to spread to neighboring states. In March 2014, 28 states had positive detections, and by June 2014, 32 states had positive detections.
In summary, she said there was a short time period for the spread of this virus. Although, it started with nine states, that number almost quadrupled within a short time span. This example shows the relationship
of geographic proximity to virus transmission. Due to this relationship, there is a need to be able to predict these shocks fairly quickly and to account for the impact on NASS estimates.
DISCUSSION
Ron Plain asked whether response rates are greater among large operations or small operations. Lori Harper (Methodology Division, NASS) replied that they tend to get data from the largest operations because NASS makes extra efforts to reach them and, if efforts fail, NASS imputes for them. The middle-sized operations, with hog inventories ranging from 5,000 to 20,000, have the highest nonresponse rates. These operations are typically a challenge to reach via telephone if they do not respond by mail. The smaller operators are contacted less frequently than larger operations, tend to be at home in the evening, and answer NASS calls.
Katherine Ensor asked whether NASS has analyzed the operator-level individual time series data. She said it seems like a lot of data are rolled up into summary information. Linda Young replied that the state statisticians spend a lot of time looking at operator-level time series and trends as part of their analysis. Headquarters statisticians also have the operation-level data. Ensor noted that operator-level data might be particularly useful to try to understand the spread of disease.
Nell Sedransk observed that not every operator reports four times a year, usually only two or three. To try to identify and characterize production patterns, NASS has restricted attention to those operators with a long enough time series to see how the operations have changed over time. She noted that very small operations, with fewer than 100 hogs, tend to be very individual. NASS has had better luck focusing on the middle-sized and the larger operations. She said in equilibrium, the major producers have a very smooth throughput with few unpredicted shifts. The middle-sized operators show a much greater impact of shocks.
Ensor commented the individual operator time series provide a rich dataset to understand equilibrium as well as shocks, and asked whether more could be done with these data. Dan Kerestes said quite a bit has been done. As part of the processing and editing of the survey data, analysts use a software tool that displays the company-level time series (cur-
rent and historical data) for all inventory categories. This use also illustrates the importance of high response rates, especially among larger companies, he noted. Chris Wikle (University of Missouri) suggested that it might also be useful to examine the spatial aspects as well as the time series aspects.
Eric Slud asked whether the unit-level time series are used to do the imputation for unit-level nonresponse. Harper replied this is done for extreme operations. The statisticians in the field offices examine an operation historically, looking at trends and growth to come up with the best possible values to impute. For nonrespondents not considered extreme operations,3 NASS uses an adjusted estimator, an extension of a re-weighted estimator to include information about operator status (e.g., whether still in the hog business). Slud asked how NASS adjusts for item nonresponse or incomplete forms. Harper replied that for partial responses, she thinks NASS uses manual imputation using the data reported, as well as historical data provided by that operator.
Lee Schulz (Iowa State) asked about the potential for bias in the imputation process, particularly in detecting future shocks. He asked whether PEDv resulted in any adjustments to the process or model. Gavin Corral reported updates in the way NASS looks for shocks (described in Chapter 5) but no changes to the Kalman filter model. The current approach to detecting shocks would identify PEDv, but with a one-quarter lag. That model basically gives a red flag for an unusual quarter. The red flag would be reported to the pre-board and Hog Board (described in Chapter 4). The pre-board might use that information to develop one or two scenarios for consideration by the board.
Nancy Kirkendall (National Academies of Sciences, Engineering, and Medicine) provided an example of using time series models with an adjustment to reflect market changes to impute for nonresponse. She reported that the Energy Information Administration used time series models for each respondent to a survey. The models were used to project the current report. If a company did not respond, that projected value was adjusted by the ratio of the sum of all reported values divided by the sum of the projected values for the companies that reported. The adjusted number was used as an imputed value.
___________________
3 Extreme operators are the largest operators in a state, that is, those assigned to stratum 98 as illustrated for two states in Tables 3-1 and 3-2.