The final session of the workshop was a summary of the views by committee members and invited discussants about the modeling approaches they think National Agricultural Statistics Service (NASS) might consider.
Katherine Ensor said that she favors a space-time state-space evolving system with (at least) a two-state process to capture various types of shocks. The challenge is that there is little to no data for estimating the state during times of shock, so it would be a modeling exercise. She is not positive that evolving parameters are needed, but the option should be considered.
Lee Schulz asked about the process of determining whether a model is good enough and the evolution of that process. It is always possible to improve a model, he noted. He asked in what situations NASS would switch to a new model and what would prove to them that the new model works.
Linda Young reminded the audience that Gavin Corral presented four model criteria (see Chapter 5). She said that in the end, the model has to show over time—and it can be on historical data over time—that it does a good job of predicting the final estimate after revisions. The model should be closer to the final estimate than the initial estimate is now. The demonstration that it meets that goal is it must use data since 2008 and include challenging situations such as natural disasters and disease. In particular, she said, NASS would like to predict the final estimate sooner than the
Hog Board does now. The board achieves the final estimate through its revision process better than any current model does.
Schulz asked whether this process would be internal or a competitive bid process where consultants or contractors could be tasked to build models for a competition. Young replied a competitive process might be possible, but the data used by any such model are not publicly available. A public competition would require that NASS simulate the data in some way. This is difficult, Young said, because the industry is so concentrated in some states that high-quality simulated data might reveal confidential information. She welcomed suggestions for solving that problem. She concluded that the increasing concentration is making large producers more vulnerable to disclosure, which NASS cannot risk.
Returning to the question of how NASS would adopt a new model, Young said that the Kalman filter model (KFM) is the best they have now and is used to provide model-based results to the board. If NASS finds a better model, it would also be used to provide results to the board. There would not be an automatic switch-over, she said. Results would be tracked until the board is comfortable that the new model provides an improvement. It is an evolving process.
Luca Sartore asked about the Pfeffermann example described by Gauri Datta that used groupings of states. If the model accounts for several groups of states, he asked, is an individual state in only one group or can it be in more than one group? Datta replied the states were grouped into nine geographic regions, with each state in one region. The example used a state-space time series approach to borrow strength, and the authors introduced benchmarking to make regional predictions to add to the total. Sartore wondered whether the model could allow each state to appear in more than one group. Datta said he was not sure about the modeling part, but the benchmarking process would not work. Pfeffermann benchmarked the regional estimates, not state estimates. Eric Slud noted that the Pfeffermann model also did not consider multivariate outcomes.
Slud further observed that if NASS feels confident that certain models are doing well during different periods, such as during equilibrium, going into shocks, or coming out of shocks, it would be possible to consider a composite estimate with weights that change based on the current situation. He observed NASS seems to have used the same 40 data points many times in reaching conclusion. Young said NASS uses
the information available and has not thought much about a composite estimator. It has talked about switching and the possibility of developing an indicator that it is time to switch models. She asked about possible fresh approaches.
Nell Sedransk noted that using the 40 quarters of data at the state level might help in defining these transitions because the dynamics and timing were different in different states. For example, developing a model for coming out of shocks will be best done using state-level data that exhibits this transition. Also, based on different producer populations, individual states each come out of the shock with a somewhat different trajectory.
Chris Wikle reiterated his suggestion that NASS consider integrated population models that are being used in the ecological literature. They consider multiple types of data, including survey or sampling-based data with state-space models. The models include the potential that the biological dynamics are changing. Ecological modeling frequently considers changes in habitat, but it could be time. The question is whether NASS has enough data to inform the model. One advantage of this modeling approach, he said, is that it results in uncertainty bounds for estimates. Young asked whether software has been developed for these models. Wikle said that software exists, but it may not be appropriate for the NASS application. He offered to follow up after the workshop.1
Dan Kerestes noted that the largest challenge in modeling shocks is that every shock, and every disease, is different. If it is a new disease, it is unclear how soon a cure will be identified and what impact the cure will have. If the modelers use Porcine Epidemic Diarrhea virus (PEDv) as the example to follow, the next disease may not have the same impact. When PEDv came about and a vaccine was developed, its effect on the
1 Wikle, in collaboration with Mitch Weegman (MUSE School of Natural Resources), provided the following discussion and references: (1) The best general introductory overview to integrated population models (IPMs) is Zipkin and Saunders (2018). There is also some introductory and applications material, see https://academic.oup.com/aosjournals/pages/integrated_population_models. (2) Chandler and Clark (2014) provides examples of emerging of spatially explicit IPMs. At present, these are seriously limited in space because of the amount of information required for estimation. (3) Most applications include time-dependent parameters. The link in (1) above has examples. (4) In terms of software, there is no R package for these models yet. They are typically run in JAGS (http://mcmcjags.sourceforge.net/) or STAN (https://mc-stan.org/). There are cookie-cutter likelihoods to match particular datasets and research questions (e.g., state-space Cormack-Jolly-Seber (CJS) likelihood for individual capture histories, but multinomial likelihood for summarized [m-array] capture histories or multistate models). There is a lot of code online bolting various likelihoods together.
sows being farrowed and the litter rates was dramatic. A new disease may be different.
Ensor wondered whether NASS is asking too much for the dynamics of an epidemic to be picked up by a model. Perhaps the decision that there is a shock might be better guided by expert opinion. The current view is that people-derived decisions are too subjective, but maybe there is not enough information to capture all possible dynamics, and expert opinion may be useful. Lawson said, for example, training the model on PEDv would likely make it too specific. It should be as general as possible so that the dynamics for a new disease can be learned. While it is always difficult to predict something new, there are things to try, such as using a general descriptive model to capture dynamics and test on different datasets.
Kamina Johnson noted that PEDv was an emerging disease, and APHIS has established models for foreign animal diseases. It has many known parameters, perhaps not completely known but with much less uncertainty. In the emerging disease area, little is known. It may be even more important to capture uncertainty in these situations—both in model-based estimates and expert judgment. Experts in her office discuss alternative scenarios about what the specific disease might be and its potential impact. They provide a range of possibilities for situations when there is no scientifically justifiable process to quantify a parameter. It may be that NASS would also benefit from a mix of data-based estimates and expert judgment, each accounting for uncertainty.
Andrew Lawson said that the foot-and-mouth disease outbreak in the United Kingdom was highly publicized and modeled in real time by people at Imperial College. He referred to Lawson and colleagues (2011) that came out a few years later comparing the modeling efforts during that outbreak. The predictive capability of the models was incredibly low despite all the information and news, which may be a warning about the capabilities of modeling.
Lawson also suggested thinking about modeling at different levels and combining them in joint models. This approach may not be able to predict very accurately at a fine scale but could predict quite well at a coarser level. Doing the modeling together might help in making sensible predictions. With that in mind, estimating national- and state-level models jointly might be a good idea because it can borrow strength from the levels, he said.
Nancy Kirkendall said that one advantage of the state-space approach is that there are two kinds of equations: (1) the state equation that describes the process and how it changes over time and (2) the observation or measurement equation that describes the relationship of the data to the state. The ratio between the variance of the state equation and the variance of the measurement equation determines the model’s adaptability: That is, how much weight is applied to the previous state versus how much weight is applied to the new observation. One of the reasons why the KFM is so stable is likely because the variance of the state equation is small relative to the survey error, she commented. When there is a shock such as a disease, the state is changed because of the disease and the variance of the state equation will likely become larger. She suggested using this concept to further evaluate the KFM, to whether increasing the variance of the state equation would make the model more adaptable during shocks.
Kirkendall agreed with others that identifying the start time of a disease shock is likely to be difficult and expert judgment may be needed. Expert judgment will be best if the experts have the necessary information. Young replied that she thinks that NASS has explored the variance of the state equation, and in equilibrium it is small. Kirkendall suggested finding the right value to use that it performs well when the model needs to be adaptable. This would essentially provide two model-based estimates: one from the equilibrium model and one from the adaptable model.
Lawson added that when there is switching, very often epidemic models switch from being descriptive with a particular variance to something that is dependent on previous values, so the epidemic component is highly auto-correlated. With switching there will be a variance change, but other features of the model change as well.
Slud observed that even though shocks may result in changes to the optimal model, an acceptable solution might be derived as simply increasing the variance of the state equation in the KFM. To capture some of Lawson’s thoughts, Slud suggested an alternative might be to look at how the time series parameter estimates vary when the KFM was estimated during the epidemic and recovery periods of the PEDv epidemic when the model should be most adaptable. It may be worth trying some of these simple things to see whether they help the KFM adapt to a shock.
Sedransk observed that the PEDv epidemic started in nine states and expanded to virtually all states; thus, Slud’s comment about a composite
makes sense because the epidemic does not happen everywhere at the same time. She suggested the need for a dynamic piece instead of jumping to an entirely new universal model that may only apply when all states are affected, or may never apply if states are in different epidemic phases or transitions at each point in time. Slud commented that a composite estimator is not the model class that he most favors, but he does like the idea of models that apply at different points in time (e.g., equilibrium, start of disease, recovery). A composite based on them might be useful, he added, and it might be interesting to examine the residuals at the state level to see whether they suggest model deficiencies.
Ensor concluded by acknowledging how impressive the NASS presentations and modeling work have been. She noted the job of the committee and discussants was to come up with ideas for other approaches.