Modeling Approach and Implementation
This chapter of the report differs from previous chapters in that it does not directly review a section of the draft E. coli O157:H7 risk assessment. Instead, it reviews the basis of the approach and implementation of the model and offers the committee’s observations and recommendations regarding it. The discussion thus touches on and overlaps some of the observations offered earlier, as well as providing an overall assessment of the modeling work done to date.
At the outset, it should be said that the effort underlying this risk assessment is impressive. The authors have undertaken an extraordinary task of collection, analysis, and integration of information. It will be an important assessment and will undoubtedly serve as an exemplar for future assessments. The analysts are to be commended for undertaking this work. Nevertheless, several issues remain to be resolved as development continues. The committee notes that the draft’s authors have already implemented some of the suggestions discussed below.
The US Department of Agriculture (USDA) Food Safety and Inspection Service (FSIS) effort faced a number of substantial methodologic hurdles whose solutions have not been described in textbooks or in literature peculiar to microbial risk assessment. In addition to methodologic hurdles, the FSIS team has been forced to cope with the inadequacy of the knowledge base. As a result, it is appropriate that they interrupt their risk-assessment effort to allow for peer review and to reassess their solutions to some challenging issues.
The committee commends them both for the magnitude of the effort and for the principles behind their efforts. The committee believes that
many of its criticisms and suggestions regarding this model would apply to most previous and current microbial risk-assessment models if they were subject to the same intensity of review.
DESCRIPTION OF THE OVERALL MODELING APPROACH
The approach taken in this modeling effort is to create a highly complex probabilistic simulation model that extends from estimation of the pattern of prevalence of enterohemorrhagic E. coli (EHEC) among various types of cattle through propagation of the exposure predictions related to slaughter, processing, and preparation of meals to the estimation of the distribution of dose-response relationships. The dose-response relationships are derived by fitting predicted distributions of exposure to estimates of the population health risk attributable to ground beef as estimated from epidemiologic data.
In its final step, however, the model departs from the standard approach to risk assessment in a way that merits careful attention. Specifically, the risk characterization is carried out in part within the hazard-characterization stage by estimating (on the basis of epidemiologic data and investigations) the annual number of cases of EHEC illness associated with ground beef. Because the dose-response relationship is inferred from an algorithm that was designed to recreate samples from the distribution of the annual number of cases of EHEC illness, the risk estimates provided by the draft model cannot be considered to be independent of the epidemiological data.
Risk Modeling, But Not Risk Assessment as Commonly Understood
A key observation regarding the draft model is that it does not provide a risk assessment in the form that many readers would expect. To label the product a risk assessment implies that the effort is directed toward providing an estimate of risk by collecting evidence and applying mathematical tools; the estimate of risk would be a dependent output of the model. In particular, the use of the terms farm-to-fork and process risk model will imply to most readers that the many factors involved in the model are aggregated mathematically and propagated forward to generate an estimate of population health risk.
The standard approach to risk assessment is that the information input and the predictive output of the exposure assessment and the dose-response assessment are derived from independent scientific sources and that the dependent output is estimates of risk that are derived from the combination of the two subassessments. With an estimate of risk as the
dependent output, the label risk assessment is appropriate to describe the analysis in the standard case.
The present modeling effort alters that arrangement by deriving the exposure assessment and the population risk estimates from separate sources and then inferring a dose-response relationship that is mathematically compatible with the calculated exposure assessment and the distribution of population risk estimates. Such a risk assessment might be considered inverted because the nominal risk equation risk = function of (exposure x dose-response relation) has been reorganized to dose-response relationship = function of (exposure x risk).
An analogy for the present approach may be useful. Consider an analyst estimating the area of a rectangle. The analyst, on the basis of various predictive models, has simulated a distribution of possible lengths of the rectangle. The analyst also has a separate source of information regarding the total area and provides a range of estimates for the area. Given great uncertainty in the width of the rectangle, the analyst decides to generate estimates of the width of the rectangle by dividing samples from the distribution of area by samples from the distribution of length. That generates a set of candidate widths that are compatible with the other two kinds of information. In an analogous way, the area assessment is inverted to become a width assessment. Moving forward, the analyst uses the estimates of length and the derived set of widths to generate estimates of area. The questions faced in this situation are whether the calculation in the model should be described as providing an estimate of area, whether and how the model can be validated, and what can be inferred (statistically) from the set of width estimates generated in the process and how they can be used. If we modify the length or width of the rectangle in some way by using a management strategy, is it reasonable to estimate the resulting area simply by multiplying the new length by the inferred widths?
For the FSIS assessment, the question is whether and how the inferred dose-response relationship can be used in future assessments and management planning to predict the benefits (in terms of risk reduction) of altering ground-beef production, delivery, or preparation.
Assessment of the Rationale for the Inverted Assessment Approach
The nonstandard treatment of dose-response assessment appears to be based on the judgment that this component of the risk estimation carries the most—and least likely to be resolved—uncertainty. Furthermore, the uncertainty in the dose-response relationship is judged to be less than the uncertainty in the population risk estimate that typically would be considered the goal of the risk-assessment effort. That judgment seems
reasonable in light of practical and ethical constraints in performing human dose-response experiments and the continued lack of high-quality evidence suitable for dose-response characterization from, for example, outbreak investigations.
The departure from the standard approach can be justified, in principle, by the assertion that the primary goal of risk assessment is better understanding of the mechanisms of the generation, transmission, and attenuation of risk through the system. To be considered appropriate, that goal would have to be considered more important than providing an estimate of population health risk that is derived solely from simulation of the components of the risk-generating system.
Once it has established and communicated an appropriately limited set of goals (and acknowledged that not all the candidate goals can be equally served by the same effort), the decisions made by the modeling group to use various techniques can be understood and judged relative to the stated goals rather than to a presumed or standard goal.
In light of the concerns raised in the earlier sections of this review, particularly those addressing the Production and Slaughter Modules, the authors may wish to reconsider whether the dose-response assessment is truly the most uncertain modular component of the model. From a modeling perspective, the current state of knowledge available to predict the transmission and ultimate fate of EHEC from the farm to the ground-beef patty may be at least as uncertain as the dose-response relationship. Given the complexity of the steps involved, combined with the legal and regulatory data-collection and -reporting environment, there may be little hope of gaining insight into this process without a fundamental change in the situation.
It may be possible to place more faith in the use of Shigella dysenteriae 1 as a surrogate for the best estimate of a dose-response function for EHEC, as is proposed in the Hazard Characterization chapter, than in the implemented exposure model as a surrogate for the reality of the ecology and transmission of EHEC within and between farms, feedlots, slaughter-houses, combo bins, and grinders and through the multitude of potential cooking practices and consumer behaviors.
An Example of the Standard Approach
After the FSIS draft report was released for public comment, a risk assessment by Nauta and colleagues (2001) became available. It addresses the risk of EHEC illness from steak tartare consumption in the Netherlands and was produced for the Rijksinstituut voor Volksgezondheid en Milieu (RIVM). The risk assessment had not been peer-reviewed when it became available to the committee, but it is described in sufficient detail to compare the modeling efforts at the level of the overall approach. Nauta
et al. carry out the risk assessment in the more standard format, providing independent estimates of the distribution of exposure through predictive modeling and of the dose-response relationship on the basis of data from a 1997 outbreak in Japan (Shinagawa, 1997). Those are combined to form a prediction of the population health risk attributable to consumption of steak tartare contaminated with EHEC. The results of the analysis can be compared on a truly independent basis with population risk estimates from the Netherlands. In this case, the baseline predicted number of cases of EHEC illness due to steak tartare is higher than the total number of cases associated with EHEC from all foods. That independent assessment demonstrates that some components of the model are overstating the risk. The point of this comparison is to suggest that there may be some merits of “face-value” validation in which the model generates a risk estimate independent of illness surveillance data. The Nauta et al. model may not be satisfactory in its performance, but it has the considerable potential benefit of independent and transparent validation.
The FSIS risk model cannot provide that degree of output validation until the dose-response relationship is generated from a source independent of the validation data. It would also be possible to validate the model if one or more surrogate dose-response relationships (for a foodborne pathogen other than O157:H7) were adopted for use. Alternatively, it is possible to validate the inferred dose-response function as was attempted in the Risk Characterization chapter with an outbreak investigation. Regardless of any validation efforts, transparency would be greatly improved by a simple demonstration of the range of risk estimates that are generated by using a variety of dose-response relationships (such as the upper and lower bounds of the dose-response envelope) to show the performance and the sensitivity of the model in various possible dose-response scenarios.
The committee notes that the RIVM model is not included here as an example of a better model. Its authors acknowledge many limitations, and the analysis is of a much smaller scope and less detail with respect to the evidence base. However, it is included as an example of a model that provides a risk estimate as a dependent outcome and that would more closely match the standard definition of risk assessment. Having maintained separation between the risk assessment and the national surveillance data, the RIVM model has the benefit that its output can be compared with independent epidemiologic data for purposes of validation.
Additional Comments Regarding the Inverted Approach
The alternative approach used by FSIS carries some disadvantages. As noted, the primary drawback is the loss of the face-value validation of
the output through comparison with independent epidemiologic data, in that the data have essentially become part of the assessment. The impact of the inverted assessment is that—from the point of view of comparison with population health risk data—the model can never be wrong (because it constitutes a circular argument). An overestimate in the exposure distribution would be accommodated by underestimating the probability of illness, and underestimation in the exposure distribution could be accommodated by overestimating the probability of illness. The committee assumes that the motivation behind the algorithm is to practice a form of model updating, that is, using independent observation to improve the accuracy of the model. The updating algorithms are described later, including some suggestions for making the updating more compatible with these goals.
Another drawback in the approach is the lack of a scientific evidence base for the dose-response relationship. The dose-response relationship is derived from model assumptions that are not related to the pathogenicity of EHEC. Any change in the parameters of the exposure assessment (for example, an improved estimate of the prevalence in a population of cattle) or in the assumptions leading to the baseline population health risk estimate (for example, a change in the etiologic fraction estimates) changes the basis of the inferred dose-response relationship. It is not clear where an appropriate end to this cycle of revision would be.
The approach is much harder to understand than a straightforward assessment. The resulting uncertainty in the simulated dose-response parameters is a product of the uncertainty in the exposure assessment and the uncertainty in the population health risk estimate. The only information provided in the algorithm that is directly related to the issue of the pathogenicity of EHEC is the envelope that limits the search space in inferring the dose-response function and the assumption that the functional form will be beta-Poisson. The complexity of the relationship between the many sources of uncertainty and the final distribution of dose-response parameters may be a threat to real transparency for the great majority of external reviewers. This is a different matter from the strict transparency of the approach demonstrated through provision of the report, appendixes, and underlying model code. Some judgment is required as to whether a dose-response relationship derived in this way is preferable to a simpler model with a more transparent depiction of the underlying uncertainty.
Despite the potential problems, it must be recognized that the mere departure from a standard approach does not in itself constitute an error. A decision to depart from the standard approach could be considered entirely appropriate to the situation. But, it is important to communicate the nature of the departure and its impact on the overall utility of the
model and to ensure that readers do not misunderstand the output of the model as being a risk assessment as it is commonly understood.
Regardless of the presence of any technical errors, there is the risk of errors of omission and commission. The potential error of omission lies in failing to ensure the full communication of these issues. The potential error of commission lies in describing the effort as “generating” or “predicting” population risk estimates when in fact population risk estimates are provided as input to the model on the basis of epidemiologic data.
A serious miscommunication could result if readers form the impression that the model propagates evidence forward to generate population health risks and then judge the model to be appropriate on the basis of the quality of the match to what are thought to be independent epidemiologic data. Having departed from the standard approach, the authors have the burden of ensuring that readers do not construct an inappropriate mental model of the approach and thereby form a judgment of its validity.
There may also be some concern about the utility of a model generated in this way if risk-management decision-making requires the provision of a risk-assessment model that can be validated to some extent by national-level epidemiologic data. Such a requirement is not stated for this situation.
Therefore the committee recommends that the authors communicate more clearly the nature of, the rationale for, and the impact of the departure from the standard risk-assessment approach and should consider relabeling the product as a system risk model to avoid implying that the model generates an estimate of risk independent of that derived from epidemiologic data.
The authors should reconsider the approach taken to infer the dose-response relationship in light of the loss of the potential for model-output validation, a desire to improve transparency, and concerns regarding whether the uncertainty is actually greatest in the dose-response characterization. Chapter 5 of this report offers comments regarding the choice of surrogate pathogens.
DESCRIPTION OF MODEL-UPDATING (ANCHORING) ALGORITHMS
At several points in the development of the model, algorithms are invoked to adjust the simulation outputs of the model to make them more compatible with observed data. This approach, called “anchoring” in the draft, is applied at the end of the simulation of grinder loads to adjust the simulation results to be compatible with FSIS sampling data. A variation of model updating is also applied in the hazard characterization stage to
match simulated exposure distributions with predictions of the numbers of cases of illness.
The application of model updating is well founded in health risk assessment and related fields of environmental modeling (see Brand and Small, 1995; Small and Fischbeck, 1999). Updating is of particular value when models are created under conditions of high levels of uncertainty. In such cases, the variance in model estimates grows as the evidence is propagated through each linked submodel. In the end, the distribution representing uncertainty in the predicted risk can be too broad to provide discriminating evidence in support of decisions. Thus, it is generally desirable to include any source of information that can reduce uncertainty in a model’s output. The algorithms for model updating used in the draft report are described below.
Updating Grinder-Load Concentrations
The exposure-assessment modules yield distributions for the number of grinder loads predicted to contain levels of EHEC ranging from 1 colony-forming unit (CFU) to 1012 CFU. FSIS carries out microbiologic testing for E. coli O157:H7 in raw ground beef, so there is an opportunity to provide additional information to the model by using the results of the sampling.
The approach taken is as follows:
Infer the distribution of the proportion of positive samples taken from grinders that would be statistically consistent with the FSIS sampling evidence.
Calculate (by simulation) the proportion of positive samples by simulating the exposure model up to the point of grinder-load concentration and simulating the sampling and detection process.
For each simulation, compare the calculated proportion of positive samples with the distribution of the proportion of positive samples inferred from FSIS results.
Each simulation is treated in one of three ways:
If the simulated prevalence falls between the 5th and 95th percentile of the inferred prevalence distribution, it is accepted as a plausible simulation.
If the simulated prevalence exceeds the 95th percentile of the inferred prevalence distribution, the simulation is rejected from future calculations as implausible.
If the simulated prevalence is below the 5th percentile of the in
ferred prevalence distribution, the simulation is amended by shifting the histogram of grinder-load concentrations to the right (that is, increasing it in 0.5 log increments) until the calculated proportion of positive samples approximates the mean of the inferred proportion of positive samples.
That approach raises a number of concerns in that it contains arbitrary measures and unsupported suppositions:
The choice of the 5th and 95th percentiles of the distribution as critical limits is arbitrary and effectively censors or distorts data that fall outside these bounds.
The uncertainty in the range of proportions inferred from the sampling evidence may be underestimated. Specifically, the inference uses a point estimate of the sensitivity of the detection set at exactly 4 times the point estimate of the sensitivity of another test that is itself uncertain. The effect is to overestimate the inferential value of the FSIS sampling process and therefore to limit artificially the acceptable range of exposure simulations.
The distinction made to accept and adjust simulations below 5% and to reject unconditionally those above 95% appears arbitrary. One could just as easily find a mechanistic justification to adjust the concentration histogram downward as to adjust it upward.
The fabrication process is modeled as contributing to grinder loads only in situations in which other uncertain factors may be underestimating the pathogen load in grinders, as opposed to having an independent contribution in each simulation, as might be expected; in this way, the impact of fabrication depends on unrelated factors and not on any explicit assumptions regarding the process of fabrication.
The shift of all grinder-load profiles that fall below the 5th percentile toward a distribution leading to the mean proportion is an arbitrary distortion of the grinder concentration distribution.
The overall effect of the algorithm is to limit the simulations to those which are compatible with the central portion of the sampling evidence and to distort other simulations to reinforce the mean estimate from the sampling evidence. This is particularly problematic in that it eliminates lower-probability high-risk situations, which are normally of great interest in risk assessment.
Generally speaking, the parameters of the dose-response algorithm are not well specified, making it difficult to understand and evaluate the derivation.
The draft does not provide summary statistics associated with the proportion of simulations that are accepted, rejected, or adjusted. It is
therefore not possible to judge how large an impact the algorithm has on the overall simulation process. In addition, there does not appear to be an analysis of the factors underlying the rejection of simulations—that is, an assessment of the patterns of inputs that are associated with the set of rejected or adjusted simulations. The draft’s Appendix A refers to a paper on a Bayesian synthesis method by Green et al. (2000) as a source for the procedure, but the inferential approach described in the paper does not appear to have been used.
Estimating the Dose-Response Relationship
The process for estimating the dose-response relationship is based on iterative matching of two pieces of evidence. In each iteration, the first source of evidence is a sample from a set of simulated exposure distributions. Each simulation generates a histogram of the frequency of servings at discrete levels of number of CFU per serving. The second source of evidence is 19 percentile estimates (from 5% to 95%, in 5% increments) from an uncertainty distribution of the population risk estimate of the annual number of illnesses, on the basis of epidemiologic analysis. The assumed dose-response curve is the beta-Poisson function with parameters α and ID50.
A fitting algorithm then finds a value for ID50 that will translate each exposure distribution into each of the 19 discrete estimates of population risk. This process is repeated for seven potential values of the α parameter. The result is a total of 19 × 7 × N dose-response relationships (combinations of α and ID50) where N is the number of simulated exposure distributions for which the fitting is done (N appears to be set at 100).
The percentiles of the ID50 parameter are then calculated from the entire pool of results. This is not specified in the report, but the median dose-response curve appears to be based on the 50th percentile from the pool of ID50 values. The value of α that is assumed to apply for the “50th percentile” dose-response curve is not clear after a review of both the model implementation and the draft.
The committee has the following concerns with this approach:
There is no description of the mathematical or statistical basis of the approach, nor is there any reference to a similar approach applied elsewhere in the literature.
The basis of including the 19 percentiles for developing a pool of ID50 values is not clear; it appears to be arbitrary.
The meaning of an ID50 value that is the 50th percentile of such a pool of fitting results is not clear and is not explained beyond the state
ment that it is the median value of a pool of data whose elements do not appear to have a formal basis.
The approach does not appear to conform to any established process of inference. Some reference is made in the spreadsheet to prior and posterior estimates, implying a Bayesian updating process, but no evidence of a likelihood function or other expected components of such an inferential approach is given.
Although it may be reasonable on scientific and qualitative grounds to state that a pair of surrogate pathogens form a plausible envelope, this is not equivalent to stating that the values of α and ID50 must be limited to those achieved by the fitting algorithm for the two pathogens. The uncertainty in the values of both α and ID50 that result from fitting to the feeding-trial data for Shigella and enteropathogenic E. coli would be expected to be broad. In the dose-response estimation method, the range of uncertainty in the α parameter is limited to 0.16–0.22 in steps of 0.01, thereby providing seven alternative values of α. That is particularly relevant, given the committee’s finding that Shigella dysenteriae 1 may constitute a reasonable surrogate for a “best estimate,” rather than its current role as an estimate of the upper bound of the envelope.
Alternative Model-Updating Strategies
Both the model-updating processes described above appear to lack a formal statistical basis. Given the use of Bayesian updating processes at various points in the model and the overall reliance on Monte Carlo simulation, it seems appropriate to consider using a form of Bayesian Monte Carlo simulation (or some of its more advanced resampling relatives) to incorporate properly the information provided by the observational data (see, for example, Brand and Small, 1995; Dilks et al., 1992; Gelman et al., 1995; Small and Fischbeck, 1999).
There are a number of key differences in the application of an algorithm based on the Bayesian Monte Carlo methods:
It does not place the burden of model adjustment on any one part of the model (that is, both the exposure and dose-response modules would be updated).
The updating process works both “upstream” and “downstream” of the observation point.
It does not allow for arbitrary adjustments.
The quality of the process generating the observational data must be carefully scrutinized and quantified in the development of likelihood functions.
It can be appended to the simulation model with moderate computational effort.
The sensitivity of the results to the updating process can be studied and compared with intuitive judgment regarding the true informativeness of the observational data.
It allows the simulation model to deviate from the distribution of observations of the output to the extent that the observations of the output are themselves imperfect.
Prior statements of uncertainty (such as the dose-response envelope) are provided for but with more formal treatment.
Therefore the committee recommends that the authors replace the current algorithms for updating grinder-load concentrations with a more formal, statistically based model updating procedure.
Also, the committee recommends that the authors replace the current algorithms for calculating dose-response parameters with model elements based on evidence that is independent of national epidemiologic data. That will allow for limited validation of model estimates with epidemiologic data. For the grinder-level observational data and any other observational data in the system being simulated, the authors may wish to consider Bayesian Monte Carlo methods to provide a structured method of updating model parameters in light of observational data.
At several points, the FSIS draft report argues that its findings are “comparable” with other estimates or descriptions of outbreaks. In some cases, however, the comparisons are unconvincing. The draft cites Cassin and colleagues’ (1998) mean per-serving risk of 5.1×10−5 as being “comparable” with the report’s finding of 9.6×10−7. Using the same calculation that converts 18.2 billion servings into 17,500 cases of illness in the draft risk assessment, the Cassin et al. result would yield 930,000 cases. That is clearly not “comparable.” It is thus unclear what the criteria might be for assigning such a label. If the results are comparable in some other ways, they should be described, but the purely numerical results suggest just the opposite.
Apart from numerically questionable comparisons, the underlying basis of comparison is also problematic and has substantial potential for miscommunication. The draft model is constrained to deliver risk estimates that are predicted by epidemiologic analysis. The comparison may also suggest that the underlying models are comparable. In reality, the comparison is between the Cassin model and the epidemiologic analysis. On that basis, the Cassin model substantially overestimates the risk com
pared with that suggested by the epidemiologic analysis. At the same time, little can be said in comparing the Cassin model with the underlying draft model, because of the inverted assessment approach.
It is stated that the derived dose-response function “shows consistency with information obtained in a ground-beef associated outbreak in the northwestern United States” (p. 119 in the draft). On the basis of the draft’s Figure 4-5, the information obtained from the ground-beef-associated outbreak is so dispersed that it is consistent with virtually every dose-response curve that could reasonably be suggested. It appears that the outbreak provides hardly any discriminatory information with respect to choosing or validating a range of dose-response curves. Use of the label “consistency” may be somewhat generous with respect to the implied validation. If anything, the information from the outbreak suggests that the dose-response envelope is too limiting, as acknowledged by the statement (p. 119 in the draft) that the “Shigella dysenteriae dose-response function fails to explain all of the outbreak’s uncertainty.”
The committee recommends that the authors reconsider the basis of model validation and avoid implying a greater degree of validation than is warranted by the comparisons presented.
MODELING ISSUES IN HAZARD CHARACTERIZATION
One of the goals of the risk assessment is to provide a measure of the opportunity to reduce risk through various risk mitigation actions. Assumptions used in a hazard characterization can have a great impact on the ability of a model to represent the expected value of mitigations accurately.
Scope and Context Decisions in Hazard Characterization
A number of important subassessments are required in hazard identification and hazard characterization:
To describe the evidence for probability of illness as a function of any risk factors (that is, dose, age, disease states or other conditions of the host, sex, and food-matrix effects). In the draft risk assessment, the probability of illness is provided as a function of dose, but no other variables are used to modify the probability. Admittedly, the evidence base to support modification of the probability of illness as a function of factors other than dose is weak. However, other risk factors might be included if the hazard characterization were simplified—specifically, if it were based on the probability of illness given an exposure event as opposed to exposure to a particular dose. That could provide improved resolution in one part of the analy
sis—the ability to explain demographic differences in the probability of illness. However, it would come at the cost of losing the benefits associated with explicit dose dependence in the probability of illness.
To describe the full spectrum of more-severe health outcomes that can result from the primary illness and their relationship with any risk factors (dose, age, pre-existing disease states, sex, and the like). The draft risk assessment does not use any risk factors in calculating the likelihood of transition from illness to more-severe outcomes, such as hemolytic uremic syndrome (HUS) or death, although evidence of these variations is cited. The number of severe outcomes is taken to be a fraction of the total number of cases, with no specific allocation of the burden of the severe outcomes to particular exposure groups, such as children or the elderly. The implications of the simplification are discussed in more detail below.
To describe the likelihood of secondary infections as a function of the same risk factors (dose, age, disease state, sex, and so on), including the potential for secondary infection without primary illness. Without calculating the risk of secondary infection (presumably by incorporating particular risk factors for secondary infection, such as age), it is not possible to represent accurately the public-health benefit associated with avoiding the primary cases. For example, the draft suggests that the etiologic fraction associated with ground beef may be lower for children because they are also exposed to secondary infections from day-care facilities. The latter is true, but it does not necessary imply that reductions aimed at children will reduce a smaller proportion of the problem. Given that the initial EHEC exposure in the day-care environment is likely to be traceable ultimately to some animal reservoir (such as farm exposure, pets, and waterborne and foodborne vehicles), each primary case prevented among children could have substantially more benefit in terms of the number and severity of secondary cases than in terms of prevention of a primary case among adults.
To provide an indication of the relative value to be placed on preventing primary cases that are more likely to result in severe morbidity or death or on preventing cases in subpopulations that are generally afforded more protection in public-health efforts (for example, children). Evaluation of the relative value of interventions is particularly important where there is known heterogeneity of the case-complication or case-fatality rates across subpopulations. That is true for the E. coli O157:H7 infection case-complication rates for HUS and the case-fatality rates with and without HUS in the very young and the very old. Because the increased likelihood of secondary infection among the very young is coupled with the increased likelihood of developing HUS, these factors combine to make up an important potential source of health burden that is missing from the draft model.
The committee believes that there are a number of reasons why it would be valuable to provide a detailed characterization of the risk attributable to ground beef or to beef and dairy production generally. To characterize fully the risk assessment and its relationship to public-health goals, the following are required:
An explicit accounting of the total risk attributable to the pathogen regardless of source.
An explicit accounting of the proportion (and uncertainty therein) of the risk that is available to be reduced through mitigation of sources and pathways that are included in the risk assessment. This characterization is valuable in any risk assessment, but it is vital in this case, where the estimate of risk attributable to ground-beef consumption is integrated directly into the model to derive estimates of the dose-response function.
An explicit accounting of the proportion of risk (admittedly, very uncertain) that is thought to be attributable to pathways that are not part of the scope of this assessment but are closely related (EHEC other than O157:H7, cross contamination in homes and in food services, unpasteurized milk, occupational exposure, waterborne risk due to livestock operations, contact with animals, custom slaughter, manure management, and so on). This would allow for the consideration of the appropriateness of the scope of the assessment with particular attention to missing pathways that generate health benefits from the same mitigation options as are being considered for the pathways that are included in the scope (for example, reduction of the pathogen prevalence or load in animal reservoirs).
An explicit accounting of the proportion of secondary cases (for example, among children) that might be prevented by avoidance of primary cases (caused by consumption of contaminated ground beef) that are within the scope of the risk assessment.
An explicit accounting of the various indicators of attributable risk (outbreak data, case-control studies of sporadic cases, passive surveillance, and the like) and their expected inferential value as related to a particular food and pathogen combination.
An explicit accounting of the potential for increased variability in the attributable risk with season and region. For example, an increase in human cases in summer could be a result of more contact through swimming, increased pathogen loads in drinking-water supplies because of rainfall or snowmelt patterns, increased contact with animals and surface water, and more contact with untreated drinking water at cottages and camps. Those factors are outside the risk assessment, but they may influence the observed patterns of incidence of E. coli O157:H7 illness and could provide important context for the management of the problem.
An explicit accounting of the potential for different patterns of at
tributable risk of illnesses (in particular, those in sensitive subpopulations) that are more likely to have severe sequelae.
Such information will be highly uncertain, but its absence seriously undermines the ability to assess and characterize risks and to measure the full value of potential mitigations. It is thus a major component of the contextual description of the risk assessment that would be key to the understanding of the situation by risk managers and stakeholders.
The following hypothetical example illustrates the point (the numbers are chosen for illustrative purposes only):
Product X accounts for 20% (20,000) of all EHEC illness. These illnesses have a case-complication rate of 5%, resulting in 1,000 cases of HUS.
Now consider two scenarios with respect to risk attribution—A1 and A2.
A1: Product X accounts for 20% (20,000 cases) of illness, and the attribution is constant among different age groups.
A2: Product X accounts for 40% of EHEC illnesses in children but only 10% in the remainder of the population.
And consider two scenarios with respect to attribution of the disease burden—C1 and C2.
C1: The case rate and the case-HUS rate are uniform in the population.
C2: Of the 20,000 cases, 8,000 occur in children, and the case-HUS rate for children is 10% (800 cases of HUS). The other 12,000 illnesses occur in the general population with a case-HUS rate of 1.67% (200 cases).
And consider two scenarios with respect to the utility of preventing complicated versus uncomplicated cases—U1 and U2.
U1: Equal weight is placed on preventing cases, whether they are likely to result in severe outcomes or not.
U2: Preventing of cases leading to HUS is considered to be 1,000 times more valuable to society than preventing uncomplicated cases (self-limiting gastroenteritis).
Different combinations of those scenarios produce different risk-management situations and involve various levels of focus on particular population groups. If the burden of HUS and other serious complications is a large part of the basis of risk-management decision-making, it is important that the risk assessment explicitly incorporate scenarios that address them. That can be achieved by demonstrating which of a set of composite scenarios best represents reality or by allowing for multiple scenarios and
addressing the alternative assumptions in the risk characterization and the risk assessment in general.
If all those issues are explicit, it becomes much clearer how the draft risk assessment and hypothetical mitigations will affect public health. This includes the gains achieved by reducing the prevalence of contaminated product entering the home or retail preparation environment and thereby reducing exposure via cross contamination. It also includes the gains associated with prevention of secondary transmission by elimination of primary cases that have foodborne sources. In addition, the predicted health benefits can be added to or compared with benefits associated with EHEC control in animal reservoirs apart from the impact on the food supply (including animal contact, waterborne transmission, occupational exposure, and secondary cases that stem from these primary sources). It would be unfortunate if the full value of the potential effectiveness of proposed mitigations were underestimated because of limitations in the scope of the assessment.
Therefore the committee recommends that the authors review the scope and allocation of effort in the risk-assessment model with respect to its ability to generate unique insight into the burden of hemolytic uremic syndrome, other severe sequelae, and mortality. Those are the outcomes that arguably justify the attention paid to EHEC compared with pathogens that result in a much larger number of illnesses. The authors should also review the scope of the model and its documentation to ensure that the full public-health context and thereby the value of potential mitigations can be described and measured by the risk assessment.
Attribution of EHEC to Ground-Beef Consumption
The FSIS draft risk assessment relies on matching the cases predicted by the broad spectrum of ground-beef production and consumption behaviors (although ignoring, at this point in development, the potential for cross contamination) with the fraction of cases that might be prevented by removal of the risk factor of eating “pink” ground beef. Given that the epidemiologic data effectively become part of the dose-response assessment and ultimately govern the risk estimates, they need to be afforded detailed treatment.
The fraction attributable to ground beef is calculated on the basis of three sources of information:
The proportion of outbreaks attributable to ground beef (one calculation).
The proportion of illnesses within these outbreaks that are associated with ground beef (one calculation).
The population-attributable risk calculated from case-control studies of sporadic cases (four calculations).
Those six calculations are used in the model to estimate the proportion of cases attributable to ground beef by randomly selecting draws in each iteration, assuming that the true fraction is equally likely to be one of the six calculated estimates.
There are a number of concerns with respect to each of the sources of information and their use in the draft risk assessment:
The information on the proportion of outbreaks attributable to ground beef and the proportion of outbreak cases should be applied only to the fraction of EHEC cases that are believed to occur in the form of outbreaks. This also applies to information based on sporadic cases.
Outbreaks whose source is not identified are allocated equally among known sources. Consideration should be given to the notion that outbreaks with unknown sources are far less likely to originate in ground beef, given that ground beef is a leading candidate in any investigation of EHEC outbreaks.
For the case-control studies, the logic applied is that only cases that resulted from exposures of persons who recall consuming “pink-in-the-middle” ground beef are attributable to ground beef; other cases are not. That does not take into account the probability of illness associated with any other consumption of undercooked ground beef, including respondents that did not notice the color of the meat and circumstances in which the meat was not pink but had surviving organisms.
An unpublished paper by Kassenborg et al. (2001) cited in the draft gives the population attributable risk (PAR) as 8% and 7%, respectively, for the risk factors “ate pink hamburger at home” and “ate pink hamburger away from home.” Because removal of both pathways of exposure would reduce the number of cases associated with ground-beef consumption, it would seem that they should be added in the calculation of the fraction associated with ground beef. These fractions are averaged in the draft analysis, yielding a lower limit of 7.5% for the PAR instead of their sum of 15%. It seems reasonable that the risk attributable to exposures to beef known to be “pink in the middle” should constitute a minimum for the attribution of total risk to ground-beef consumption. Such exposure seems to account for only a subset of the exposures to contaminated ground beef even if the estimation is limited to direct consumption of ground meat as opposed to consumption involving cross contamination.
The risk factor “pink ground beef” could be confounded (as suggested in the Kassenborg et al. manuscript) with cross contamination if there is a common causal source, such as poorly trained food preparers or inattention to food-safety practices.
During one of the committee’s public meetings, comments were received regarding the use of Centers for Disease Control and Prevention (CDC) outbreak data in the estimation of the fraction of EHEC that is attributable to ground beef. [The comments were in the form of a letter and a copy of DeWaal et al. (2001) provided by the Center for Science in the Public Interest.] The comments included the suggestion that the draft report underestimates the attributable risk by estimating the proportion of outbreaks and cases as a fraction of all outbreaks (including, for example, those with waterborne sources). The committee notes that the choice of outbreaks and cases from all sources is the appropriate denominator because FSIS is using this estimate to infer an attributable number of cases from the FoodNet surveillance system, which itself includes illnesses from all sources. Use of foodborne sources to generate a proportional estimate and all sources from FoodNet would lead to overestimation of the number of cases attributable to ground beef, assuming that all other factors were unbiased. Nonetheless, estimates of the attributable fractions of foodborne outbreaks and foodborne cases would provide valuable context.
Comments were also received regarding the completeness of the CDC outbreak database. Some consideration should be given to the likelihood and magnitude of any bias that may result in attributable risk estimates from exclusion of outbreaks that are not contained in CDC databases. Again, clarification of the estimated proportion of EHEC illnesses that appear to be in the form of detectable outbreaks would put this issue into better perspective.
Therefore the committee recommends that the USDA or perhaps an interagency body consider developing a standard and formal procedure for estimating the fractions of foodborne illnesses attributable to different foods. This will need to take into account the diverse and often conflicting sources of evidence available, including expert judgment. The process should be carried out independently from the process of commodity-pathogen-specific risk assessment and should be continuously updated.
The FSIS draft risk model consists of the evidence base captured in the documentation and a simulation model implemented with the spreadsheet environment Microsoft Excel (referred to hereafter as Excel). The
simulation model is implemented by using Monte Carlo simulation. The model was provided to the committee in multiple versions:
A version that uses probabilistic sampling functions that are part of an Excel add-in, @RISK, with the overall simulation process controlled by macros written with Visual Basic for Applications (VBA).
A version that uses probabilistic sampling functions provided by FSIS, with both the sampling functions and the overall simulation model implemented with VBA (no longer requiring @RISK).
Choice of Modeling Environment
In the field of quantitative risk assessment, and for quantitative microbial risk assessment in particular, the use of Excel as a modeling environment is very common. The @RISK add-in is also a common tool for probabilistic simulation in quantitative risk assessment. These choices for software implementation are associated with various benefits, costs, and risks.
The basic modeling environment (Excel) is one of the most widely used software programs in the world. That makes sharing of the basic model structure accessible to a great majority of interested parties with no incremental cost for the broader community. To the extent that there is value in having model assumptions and calculations visible to the largest possible audience, Excel serves this purpose. However, as discussed below, Excel without the benefit of modeling “add-ins” does not provide the capacity to perform probabilistic simulation. And, the spreadsheet environment is inherently problematic for the purposes of complex modeling.
For simulation models of low to moderate complexity, a majority of interested parties and potential reviewers can follow the flow of information and calculation in a spreadsheet. It is reasonable to assume that stakeholders who cannot readily follow spreadsheet logic have ready access to someone who can assist them. In this case, there is great value to the broader community in having an implementation that does not present barriers to transparency.
Another benefit of using commonly available software is that the quality of the software is to some extent known. The quality may be criticized in some cases (for example, because of problems with random-number generation algorithms or inaccuracies that are known to occur in specific situations), but the large community of users of such software essentially acts as informal quality assurance. It may be tempting to replace widely used software with software that is superior in some particular function, but that could come at the cost of a lower (or effectively unknown) level of
quality assurance in other functions. There is essentially a risk–risk tradeoff in the choice of software.
Excel does not provide algorithms to invoke and control a sampleand-calculate iteration process (to automatically repeat calculations with new random samples) and to store and process the simulation results (such as displaying and providing the average value obtained in a set of 10,000 sample-and-calculate iterations). The sampling and simulation control functionality is provided by various software packages as add-ins to Excel (such as @RISK and Crystal Ball). The add-ins generally cost more than $500 and present a financial barrier to widespread dissemination and review of the models among stakeholders. For stakeholders who have an interest in exploring the model but do not have a long-term interest in quantitative risk assessment with spreadsheets, that constitutes a substantial one-time cost to review and run the model. Software packages that allow for models to be viewed and simulated by others at no cost would be beneficial in addressing the problem.
For probabilistic modeling, good practice (Burmaster and Anderson, 1994) suggests the use of so-called second-order modeling that explicitly separates uncertainty (representing a deficiency in the knowledge base) and variability (representing known dispersion or distribution of some quantity). That requires an additional level of complexity in the control of the calculations (loops within loops) that is not available in the standard recalculation of spreadsheets and is only crudely available in @RISK. To implement that functionality in Excel, the USDA team has written VBA code to control the ordering of the calculations and the storage of the considerable amount of data generated (and for other reasons). Including VBA code in the overall simulation reduces the overall transparency of the model in proportion to the ability of reviewers to understand the workings of this programming language and the amount of time they have available for such review.
The basic spreadsheet environment has several limitations. As models become more and more complex, the amount of spreadsheet space taken up by the model assumptions needed for intermediate calculations and the storage of output calculations becomes quite large. That can be managed with VBA code to store and perform intermediate calculations, careful documentation of the spreadsheet, and detailed user manuals. However, beyond some level of complexity, the spreadsheet environment becomes more a problem than a solution for the purposes of model communication. The benefits of using widely available spreadsheet software,
noted above, can be outweighed by the inadequacy of spreadsheets in the management and communication of complex models.
The popularity of spreadsheets is based largely on the user’s freedom to structure and implement calculations in diverse ways with little or no formal structure. In addition, it is possible to directly access and control the data and the calculations with relative ease. Such freedom brings considerable systematic risk. The risk is based on the accessibility of individual data points (in large arrays and matrices) and the potential for undetected data corruption or formula errors that may be singular faults in a large array of otherwise correct formulas. It is a considerable challenge to ensure that the data and formulas are uncorrupted by small errors, particularly when multiple persons are implementing and adjusting the model.
The basic problem with spreadsheets is that they make it easy for an analyst to make errors. Spreadsheets lack the transparency of explicit programs. Simply put, it can be hard to keep track of what values depend on what other values and to trace an error in a spreadsheet because the structure of the calculations is cryptic. The interconnections between subcomponents of the model (workbooks) can be difficult to follow because all variables are global variables. Software developed in a modern computer language explicitly lists the input and output variables used by a particular model component. If one component’s variables are to be used by another component, this is usually handled through an explicit list of formal parameters. It is also hard to document updates or changes of a spreadsheet; this makes spreadsheets especially cumbersome when multiple analysts participate in developing the calculations.
In some cases, there can be no assurance other than through exhaustive checking of all individual cells and formulas. This process can be time-consuming and generally requires expert knowledge of the model’s intentions. The audit process is itself error-prone because of the combination of the complexity and monotony of the exercise. The task is made more difficult in the draft model by the use of direct cell references as opposed to the use of named identifiers to refer to another quantity in the model; for example, a formula for microbial growth uses the spreadsheet cell location Temperatures!$AB$82, which is located on another worksheet, instead of the label “CookingTemperature” to refer to the cooking temperature. The draft authors indicate (in Appendix C) that direct cell referencing was used to ease the audit process, but it is difficult to understand how it makes the task easier. The situation is complicated if one tries to explore the Visual Basic code, in which a different reference system (“cell(row, column)”) is used. The extreme cumbersomeness and error-prone nature of these multiple ways of referring to a variable are perhaps sufficient in themselves to justify the effort to rework the simulation model in an alternative environment.
The committee and an external reviewer had considerable difficulty in following the information flow in the model. Several errors in the spreadsheets are noted in an independent review prepared by Edmund Crouch, which appears as Appendix D of this report.
Choice of Sampling Engine (@RISK versus VBA)
As stated above, two versions of the draft model were provided to the committee. The versions differ in the software that converts input assumptions for the distribution of inputs into random samples that are used to construct the output distributions. The choice to use custom VBA code to generate random numbers provides the benefit of independence from @RISK and generates risks generally associated with new or unproven code.
Benefits of independence from @RISK:
A much larger community of potential model collaborators and reviewers.
Avoidance of the costs associated with @RISK (for USDA and people interested in using, collaborating on, or reviewing the model).
A higher level of transparency because the sampling and simulation code for @RISK is proprietary and therefore cannot be openly scrutinized.
Risks in new and unproven simulation code:
Costs associated with continued development and quality assurance of the simulation code.
Version control (ensuring that all copies of the spreadsheet have the same error-free simulation code).
A lower level of “informal” quality control, given the much-reduced numbers of users and reviewers.
The limited value of transparency in the simulation code if it is not expertly scrutinized and compared with alternatives.
The loss of various additional current and future features of tools like @RISK (graphical output, summary statistics, and various analytic tools, such as filtering).
The need to replace the Latin Hypercube Sampling algorithm to improve convergence for models that rely on adequate representation of the low-probability regions of probability distributions.
Note that those types of risks and benefits generically apply to all such choices and are not limited to @RISK or the particular custom simulation implementation developed by USDA. It is important to clarify that the decision faced by USDA should not be seen as a choice among the two modeling environments.
The issues raised here are of general interest to the broader community performing and using microbial risk assessments. USDA and Food and Drug Administration risk assessors are among the leaders in this field. Given the importance of the United States as a trading partner and the potential (but as yet undemonstrated) importance of microbial risk assessments in the international food trade, their choices of modeling environments and software are influential. The influence is based on the desire for compatibility of approaches and the fact that such models and approaches will be (and have been) copied in other countries and at the international level.
Explication of the Model
At the suggestion of the committee, FSIS developed a summary (Appendix C, “Model Equations and Code”) of the variables and equations used in the model. However, it is not sufficient for following the flow of information and computation in the model. The combination of explicit cell calculations and VBA-based calculations makes the flow of data difficult to follow. When viewing the spreadsheet for instance, it is not immediately clear whether data in a cell are truly constant (for example, a cell containing the value 300.33) or are results of a VBA calculation and therefore may change at any time. For a model with this level of complexity, more attention to the ability to document the mathematical and computational basis of the model is required.
Appendix C is a good beginning but is of limited usefulness, largely because of its use of spreadsheet cell references and its lack of a central, cross-referenced list of all variables. For each variable, the list should include its name or symbol, a description of the quantity, its units, its intended use as in input for other variables, a reference to or summary of its empirical justification, and its value or distribution or equation. The documentation should also be explicit about whether the variables are assumed to be mutually independent, correlated, or otherwise dependent.
Therefore the committee recommends that the authors review the choice of modeling environments (particularly the use of spreadsheets), simulation engines, and other implementation elements in light of all the benefits, risks, and costs associated with the many alternatives available to perform the modeling function. The choice should be based on explicit consideration of the diverse goals of the risk-assessment process both for a particular application and also as a general matter of policy support in domestic and international decision-making, and the choice should be defended in the text of the assessment.
The final risk assessment should include an explicit list of all the
variables and equations that constitute the model. This recommendation is coupled with the need to find a modeling environment that is compatible with the complexity of the model and the communication and documentation issues that are inherent in preparing and presenting such a model.
JUSTIFICATION OF MODELING ASSUMPTIONS
This section reviews the reasonableness of the justifications offered in the draft assessment for the distributions and dependencies used in the Monte Carlo simulation. This chapter of the draft does not discuss the justification of the form of the mathematical model (such as the level of abstraction, which variables are included in the model, and which equations tie them together). Other chapters address that topic and the relevant underlying science about E. coli contamination and disease etiology.
To use Monte Carlo simulation, an analyst is required to specify input probability distributions precisely. The difficulties in developing and justifying input distributions are well known in the field of risk analysis and have received much attention (Finley et al., 1994; Haimes et al., 1994). Although there is a considerable literature on the subject of estimating probability distributions from empirical data (Cullen and Frey, 1999; Morgan and Henrion, 1990), standard statistical approaches are of little practical value when data are sparse. In almost all risk assessments, analysts typically have little empirical evidence to support the distributions they select as inputs. As a result, the analyses usually require assumptions that cannot be completely justified by an appeal to the evidence. The consequences may be substantial because the results of probabilistic risk analyses can sometimes be sensitive to the choice of distributions used as inputs, and this sensitivity is usually strongest for the tail probabilities where risk assessments often focus their attention (Bukowski et al., 1995).
In the draft risk assessment, several methods were used for selecting the (marginal) distributions for the simulation. Outside reviewer Edmund Crouch argues (in Appendix D of this report) that several distribution choices are not scientifically justified, but his criticism may be the result of inconsistency in the modeling decisions made by the development team and, perhaps more important, of a lack of transparency in their documentation of the criteria they used for selecting distributions. The use of several criteria makes the documentation hard to evaluate. Even though using diverse strategies and criteria would not necessarily lead to
discrepancies in the assessment, it seems desirable to have a clearly articulated and coherent strategy for selecting the marginal distributions.
The committee notes that no distribution selection strategy is free of all controversy. The following subsections describe the most important criticisms of each approach used by the FSIS development team. By using several strategies and not justifying the choice of one over another in any context, the draft risk assessment exposes itself to all these criticisms.
Traditional or Convenient Distributions
Sometimes data for variables in the model were fitted to a traditional or mathematically convenient distribution shape. For instance, the withinherd prevalence distribution was taken to be exponential because it conveniently had a single parameter, and, when fitted to data by the method of moments (a mathematical means of deriving the population distribution of a variable on the basis of a sample), the fit was deemed adequate. In other cases, uniform distributions were selected for second-order distributions. There is little or no justification for such choices. Whenever distributions are selected or justified on grounds that appeal to mathematical convenience, this fact must be clearly acknowledged. It could be accomplished, for example, by placing all such model choices in one place in the documentation so that the choices and their inherent consequences in the assessment could be considered together.
Empirical Distribution Functions
In several cases, empirical distributions were used as inputs in the draft model. Many analysts consider empirical distributions as the best possible representations of variability because they let the available data “speak for themselves.” If all variables were treated this way, the Monte Carlo simulation would amount to a permutation study of the raw data. Risk analysts often prefer that approach because it does not require them to make assumptions about the distribution shapes or to fit distributions to data; the data are the distributions. The approach relies strongly on an assumption of the representativeness of the data; if the sample data are not an adequate characterization of the underlying distribution from which they were sampled, the assessment could be misled.
Cullen and Frey (1999) review strategies for computing empirical distributions from data. When data are abundant, this approach can yield an excellent characterization of the patterns of variability. When data are sparse, the characterization may still be reasonably good, depending on whether the data happen to be representative of their underlying distri
bution. Although the impact of sampling error obviously is increasingly important as sample size decreases, its effect is usually not incorporated into or accounted for with an empirical distribution. The primary concern about using empirical distributions as inputs in a risk assessment is that they will tend to underestimate tail probabilities. After all, unless the original sampling is very thorough and makes a special effort to observe extreme values, it is likely that, for instance, the largest value of a variable that was observed in a limited sample will actually be the largest possible value of the variable. Moreover, because distribution tails are characterized by low probabilities, sampling will typically produce few observations in the tails. Consequently, the analyst’s ability to fashion good estimates of tail probabilities will be hampered if the sample is small. That is especially troublesome because it is often the tails that are of primary concern in a risk analysis focusing on extreme events that lead to disease. It is thus desirable for modelers to take pains to consider the possibilities of values outside the observed range for all variables for which empirical distributions are used.
Some distribution selections in the FSIS draft risk assessment seem to have been based on appeals to the maximum entropy criterion, which states that when one has only partial information about possible outcomes, one should exploit the available information to the extent practicable and impose as few assumptions as possible on the missing information (Grandy and Schick, 1991; Jaynes, 1957; Lee and Wright, 1994; Levine and Tribus, 1976; Tilwari and Hobbie, 1976). The use of maximum entropy in selecting input distributions for Monte Carlo analysis is superior to naive conjecture and is considered by many to be the state of the art. The maximum entropy criterion is controversial, however. Among several criticisms, perhaps the most serious is that the model of uncertainty it uses is inconsistent through changes of scale. For instance, suppose that all one knows about a particular positive variable A is its range. The maximum entropy criterion would suggest using a uniform distribution over this range to represent the state of knowledge. Now consider the related variable A2. If all that is known about A is its range, then surely all that is known about A2 is its range, which is just the interval between (left bound of A)2 and (right bound of A)2. That means that one should pick another uniform distribution to model A2, too. But given the uniform distribution for A, one can compute the distribution it implies for A2, and this is not uniform over the squared range. Similar problems occur when log and other transformations are used or when a variable is arithmetically com
bined with other variables. The inconsistencies mean that analysts must arbitrarily pick a scale on which to express their uncertainty and resist comparing it across different scales.
Formal expert elicitation does not appear to have been used explicitly in the draft model, but the committee suggests that it could be judiciously applied in some circumstances (notably in the Preparation Module) where single or small numbers of observations are extrapolated to the entire population.
There are various approaches to eliciting information about input variables from experts or other knowledgeable persons. They range from simply asking them in informal and uncontrolled settings to using elaborate formal schemes (Cooke, 1991; Meyer and Booker, 1991; Morgan and Henrion, 1990; Warren-Hicks and Moore, 1998). Formal elicitation schemes can often be expensive. It might be reasonable to let experts define the shapes of input distributions subjectively, but this is not always a workable strategy and, when experts disagree, it can lead to even more controversy about the inputs. In the final analysis, the committee believes that there are circumstances in which it would be appropriate to solicit expert opinion regarding point estimates and distributions and that, if found useful, such information should be documented in the text and used in the model until data become available.
It appears that most of the variables in the FSIS draft risk assessment are assumed to be mutually independent. Overall, the draft is practically silent on the potential for input variables to be dependent. Although such assumptions make the computation for a model substantially easier, their justification, whether theoretical or empirical, is lacking. For instance, average carcass weight may be correlated with within-feedlot prevalence (Dargatz et al., 1997). Other candidates for dependence include any variables describing consumer behavior and preferences that may share a risk factor, such as age, ethnicity, sex, or health status.
In the interest of discovering particularly high-risk scenarios, the modelers should review the list of model inputs for pairs or triplets of inputs that are intuitively likely to be dependent. The review could be based on data (perhaps rarely), reasoning regarding a common cause, or where the variables may be expected to be affected by a common general risk factor (knowledge of preparer regarding appropriate food handling or preparation practices) that is otherwise important in the assessment (such as serv
ing size, frequency of consumption of raw ground beef, or the age of the consumer). The committee recognizes that finding concrete evidence for individual important variables is difficult; evidence of the dependence structure of two or more variables will certainly be more rare. An explicit model of such dependence may not be feasible, but the effect of plausible dependence scenarios should be considered on a case-by-case basis and presumably prioritized through causal reasoning of the plausibility of the dependence relationships. This will assist in better characterizing the potential for high-risk scenarios and may help to explain higher proportions of attributable risk to particular exposure pathways. In addition, explicit reference to the plausibility of key variable dependencies (even if difficult to quantify) is useful information for risk-management and data-collection priority-setting.
Therefore the committee recommends that the final risk assessment should address the potential for input-variable dependence in the model, based on causal reasoning and other evidence of such relationships. For potentially important dependencies, a sensitivity analysis should be performed to evaluate the nature and magnitude of the potential dependence structure.
The draft pays considerable attention to assumptions regarding the seasonal pattern of on-farm prevalence. As mentioned in Chapter 2 and Appendix D of this report, the data cited as justification for modeling seasonality in prevalence do not actually provide evidence of such seasonality. In face-to-face meetings, analysts of the development team suggest that the best evidence comes from studies conducted outside the United States; they were omitted out of relevance concerns, but the omission leaves the assertion about seasonality essentially unsupported. The relevance concerns about foreign studies should be resolved and documented so that justification for seasonality assumptions is clarified.
Therefore the committee recommends that the authors should reconsider the evidence of and the approach for inferring seasonality in on-farm prevalence, including the potential for using data from outside the United States. Evidence of seasonality might also be sought in the upper tails of the internal or external pathogen load among E. coli O157:H7 -positive animals. That may have a stronger effect (by several orders of magnitude) than simple variations in prevalence on the number of contaminated ground-beef patties. The committee recognizes that, given the substantial uncertainty associated with other important quantities in the model that affect prevalence downstream, further refinement of the exact pattern of on-farm seasonality may not have high priority.
OVERALL MODEL UNCERTAINTY AND RELIABILITY
It appears that several of the methods used in developing the draft risk assessment may tend to understate uncertainty. That would typically be considered disadvantageous, if not dangerous, in risk analyses. There are several ways this happens: incomplete reconstruction of statistical regressions, overreliance on empirical distributions, use of means other than raw data, and modeling of sampling variation without representing the underlying uncertainty arising from measurement error. Each of those is discussed below.
Uncertainty Encoded in Regressions Not Reconstructed
The draft makes use of several regression analyses, but in doing so it seems to have not fully reconstructed the uncertainty in the relationship among the random variables in the original data sets. For instance, in Equation 3.28 (described on p. 80 of the draft and p. 196 of Appendix C), constants are used to transform temperature linearly into E. coli generation time. The coefficients of the linear scaling appear to be regression parameters (Marks et al., 1998), but the regression model has not been used to re-express the uncertainty in the predicted variable. Instead, the linear scaling is the prediction of the mean value of generation time expected, given a particular temperature. That approach does not reconstruct even the scatter of the original data, much less account for sampling uncertainty. Similar applications of slope and intercept values are made in the equations for calculating the lag period for E. coli O157:H7 for a specific step of handling or storage (Equation 3.27) and the maximum population density of the pathogen in ground beef (Equation 3.29).
Overreliance on Empirical Distributions
The fact that empirical distributions typically underestimate the probabilities of values in distribution tails is a straightforward result of the reality that, given any limited random sample, the chance of ever observing the rare events in the tails is low. The authors seem to place too much credence in distributions that are based on remarkably few data points. In the case of Equation 3.11.1, described on p. 179 of Appendix C, the distribution of the number of bacteria per square centimeter of carcass surface area is specified from only four points. No accounting is made of the sampling error or measurement error implied by that small number. The problem is essentially the same as one encounters when using only a single study in a scientific review. Because of the importance of distribution tails,
the problem is also similar to using the observed maximum of a sample as the theoretical maximum of a random variable.
Modelers need to account for sampling error to see beyond the limited data available. In principle, Kolmogorov-Smirnov confidence intervals could be computed that characterize the sampling uncertainty about the distribution as a whole, assuming that the samples were independent and identically distributed. In any case, the uncertainty about distribution shape should be explicitly and quantitatively characterized. Doing so will allow analysts to revisit the choice of distribution shape in later sensitivity and robustness studies.
Use of Averages
At many points in the E. coli assessment, the modelers use average values of variables rather than a distribution of various quantities. An average does not capture variability; it erases it. Consequently, averages are of little use in a risk assessment, where one of the primary concerns is to account for variability in the system. In the case of some variables, a distribution is used, but it is a distribution of values that turn out to be averages over one or more dimensions. In such cases, the distributions represent only part of the true variability of the underlying variable.
Untenably Precise Submodels
The overall assessment will underestimate uncertainty if it is composed of submodels that underestimate the uncertainties in the variables they are used to model. One example in the draft is the model of cooking loss. Figure 3-25 on p. 88 in the draft risk assessment depicts the modeled frequency distribution of log reductions in E. coli abundance caused from cooking. It also depicts the uncertainty about this distribution by displaying 20 realizations of the distribution. It seems implausible that the true distribution of reductions, whatever it is, has the multimodality that this figure shows. The problem is not the “bumpiness” of the distribution itself, but the fact that the same bumpiness persists in all the realizations of the distribution—that is, the bumpiness is stronger than the overall uncertainty about the distribution.
As explained in Appendix D of the FSIS draft (“Modeling Issues”), the origin of the multimodality can be traced to the cooking-temperature data on which the model of log reductions was based. Each distribution of the ensemble displayed in the figure is based on integrating (apparently, actually a stochastic mixture of) nine distributions representing the effect of cooking with different pretreatments. A mixture model would be appropriate if the analysts knew the relative frequencies of the various pretreat
ments in homes and institutions. However, it seems that the mixture in this case is used to represent model uncertainty rather than the variability among kitchens.
For each pretreatment, a regression is computed from 18 points (six replicates at each of three cooking temperatures), and the mean reduction is taken from the regression line. That seems to be the real reason that the uncertainty in the distribution of log reductions is so small. The mean reduction does not seem entirely relevant. It is the outliers of the distribution (small reductions) that are likely to induce illness. It seems that the modelers strive to make their assessment consistent with the observed data but sacrificed the reliability of their results to do so.
As noted in Appendix D of the FSIS draft, the bumpiness of the distribution can be traced finally to the preferences for round numbers in the values reported for final cooking temperature on the Fahrenheit scale (for example, 150°F rather than 153°F). It seems clear that the modality is completely artifactual. That is not at all a criticism of the original data—it is how the survey turned out. However, that the bumpiness persisted through the many transformations made on the data suggests that the true scope of their inherent uncertainty was never fully recognized. In summary, the original data, with their multimodality, seem fine; what is insufficient is the breadth of uncertainty about the final distribution of log reductions.
The committee notes that if the modelers had simply smoothed the distribution before computing and displaying the uncertainty about it, the problem might never have been noticed. Their openness in portraying the details of this particular variable and the data it was based on demonstrates the utility of transparency in presentation.
Inappropriate Use of Bayesian Formulations
The committee suspects that the use of beta distributions in Equation 3.10.1, described on p. 177 of Appendix C of the FSIS draft, exemplifies another way in which the uncertainty present in the system is underestimated in the draft model. That equation defines the random variable TR (transformation ratio), which is the multiplier of the E. coli prevalence in cattle that predicts the prevalence in carcasses. Data collected from four slaughter plants during July and August (Elder et al., 2000) suggest that TR is about 160%. (The ratio is larger than 1 presumably as a result of cross contamination during dehiding.) The counts actually reported were 91 of 307 cattle and 148 of 312 carcasses being contaminated. The authors explain that the beta distributions are used to model the uncertainty about TR. The beta distributions arise in this context from a simple Bayesian updating argument about frequency estimates. However, the specified
beta distributions have extremely small variances; both are less than 0.001. Given the ordinary fluctuations one might anticipate from day to day and across slaughter plants, it seems entirely unreasonable to expect that such distributions could be realistic models of the contaminated fractions of cattle or carcasses. Because the numerator and denominator in Equation 3.10.1 are combined under an (unsupported) assumption of independence, the quotient has a very small variance, which suggests that the ultimate TR has little uncertainty.
Potentially Dominant Model Uncertainties
Model uncertainty is a class of uncertainty that pertains to the adequacy of a model’s representation of reality. Strictly on the basis of qualitative judgment, the committee suggests that the following general model uncertainties may dominate:
The ability to define an appropriate output or set of outputs from the onfarm module that is adequately correlated with the level of risk to the ground-beef supply. As described in the review of the Production Module (Chapter 3 of this report), prevalence estimates can vary considerably with detection method, type, and definition (for example, cattle with 1 or more EHEC on the entire surface or in the entire gut). It is intuitively reasonable to suggest that the number and relative proportion of animals with very high bacterial loads will dominate the overall contamination level in the combo bin, given the fact that the contamination (through pooling in the combo bins and grinders) is proportional to the total across many individual contributions. Given the expectation of logarithmic variability in pathogen loads, the highest-shedding animals will contribute more bacteria (by a factor of several powers of 10) whenever there is a transfer to the carcass surface. Conversely, the animal with an average burden may contribute comparably little.
The ability to represent the transfer of organisms from the hide or gut of the animal to the carcass surface and the relationship between measurements of cell density on the carcass surface and the amount on trim that becomes ground beef.
The ability to represent the potential for reservoirs of contamination in a slaughter plant and for cross contamination of carcasses. The potential pathways and reservoirs of contamination in a slaughter plant are numerous, complex, and likely to be dynamically changing, even if measured. The use of indicator organisms should be reconsidered for their ability to track the sources and extent of fecal contamination in the plant. Without a drastic change in the approach to data gathering and modeling, this part of the model will remain a “black box.” Attempts to model its internal mecha
nisms may be misguided until appropriate data are made available to the modelers.
The prevalence of exposure to uncooked ground beef. This variable will be inherently difficult to estimate because of the uncommon nature of the exposure and the response bias associated with its estimation from surveys or other methods. However, it could be important because there may be a relatively high probability of illness associated with the pathway compared with a cooked product.
The extent of the exposure that occurs through cross contamination. The committee recognizes that there are no examples of risk-assessment modules that could adequately describe the potential for risk associated with cross contamination during food preparation. Even an emerging experimental database that focuses on cross contamination is unlikely to support such a model in sufficient detail to form predictions of illness through this complex pathway in the near future. This remains a major methodologic hurdle in microbiologic risk assessment.
The dose-response relationship for EHEC. Despite the committee’s suggestion that a Shigella species may constitute an adequate surrogate for the best estimate of the dose-response function, there remains broad uncertainty associated with this representation and even with the characterization of the dose-response function for Shigella itself. While it may ultimately be possible to narrow the uncertainty envelope used in the draft risk assessment, large uncertainties will remain regarding the dose that corresponds to a given probability of illness.
Use of Undefined Variables
One of the features of an Excel spreadsheet is the automatic initialization (to zero) of variables that are not otherwise explicitly defined. Automatic initialization is regarded as a feature of convenience when beginning a model, but can become a serious hindrance as the model develops. Appendix D, which contains comments on the model presented to the committee by outside reviewer Edmund Crouch, cites specific examples where references were found to undefined cells in the draft model spreadsheet. Some of these cells had been given a “hatched” format, presumably to indicate that the values were not available. Because of automatic initialization, Excel regarded the undefined cells as zeros. Although this may turn out to be the correct value to use for these particular variables, the committee suggests that the final risk assessment explicitly define all variables and constants to be used in the model, simply as a matter of good modeling practice.
Experience with complex assessments has shown that profound errors can arise from simple or careless mistakes in units (Isbell et al., 1999). It is thus important that quantities to be added, subtracted, or compared for magnitude have conforming units; and quantities to be used as exponents or powers or the arguments of logarithms be dimensionless (Hart, 1995).
The FSIS draft report’s Appendix C—a partial list of the model equations and code—contains some but not all of the information needed to check unit conformance. The committee was thus unable to conduct a rigorous review of the dimensions and units of the equations and variables used in the draft model. It suggests that the final model’s mathematical expressions be checked for dimensional soundness and the input quantities be checked for unit conformity with the variables in the expression.
Overall Assessment of Model Reliability
The committee believes that a forthright and comprehensive characterization of the uncertainty in the assessment would show the overall model uncertainty to be very large (much larger than suggested by the Risk Characterization chapter of the draft), even after anchoring. It may be that the uncertainty is so large as to appear to overwhelm any quantitative predictions based on the assessment. But even if that is so, the authors should not shy away from being as forthright and comprehensive as possible. Large uncertainty itself does not preclude useful applications of an assessment. But underestimated uncertainty can threaten the credibility of an assessment and lead to unwarranted confidence on the part of decision-makers regarding the response of the system being modeled to simulated mitigations.
Therefore the committee recommends that the final report clearly describe the magnitude of model uncertainty related to key modules in the risk assessment and include strategies for reducing the uncertainty, if they exist.
Brand KP, Small MJ. 1995. Updating uncertainty in an integrated risk assessment: Conceptual framework and methods. Risk Analysis 15(6):719–731.
Bukowski J, Korn L, Wartenberg D. 1995. Correlated inputs in quantitative risk assessment: The effects of distributional shape. Risk Analysis 15:215–219.
Burmaster DE, Anderson PD. 1994. Principles of good practice for the use of Monte Carlo techniques in human health and ecological risk assessments. Risk Analysis 14:477–481.
Cassin MH, Lammerding AM, Todd EC, Ross W, McColl RS. 1998. Quantitative risk assessment of Escherichia coli O157:H7 in ground beef hamburgers. International Journal of Food Microbiology 41:21–44.
Cooke RM. 1991. Experts in Uncertainty. Cambridge: Oxford University Press.
Cullen AC, Frey HC. 1999. Probabilistic Techniques in Exposure Assessment: A Handbook for Dealing with Variability and Uncertainty in Models and Inputs. New York: Plenum Press.
Dargatz DA, Wells SJ, Thomas LA, Hancock DD, Garber LP. 1997. Factors associated with the presence of Escherichia coli O157 in feces in feedlot cattle. Journal of Food Protection 60:466–470.
DeWaal CS, Barlow K, Alderton L, Jacobson MF. 2001. Outbreak Alert! Closing the Gaps in Our Federal Food Safety Net. Washington, DC: Center for Science in the Public Health Interest. October.
Dilks DW, Canale RP, Meier PG. 1992. Development of a Bayesian Monte Carlo method for determining water quality model uncertainty. Ecological Modeling 62:149–162.
Elder RO, Keen JE, Siragusa GR, Barkocy-Gallagher GA, Koomaraie M, Laegreid WW. 2000. Correlation of enterohemorrhagic E. coli O157 prevalence in feces, hides, and carcasses of beef cattle during processing. Proceedings of the National Academy of Sciences 97(7):2999–3003.
Finley B, Proctor D, Scott P, Harrington N, Pasutenbach D, Price P. 1994. Recommended distributions for exposure factors frequently used in health risk assessment. Risk Analysis 14:533–553.
Gelman A, Carlin JB, Stern HS, Rubin DB. 1995. Bayesian Data Analysis. London: Chapman and Hall.
Grandy WT Jr, Schick LH. 1991. Maximum Entropy and Bayesian Methods. Dordrecht: Kluwer Academic Publishers.
Green, EJ, MacFarlane DW, Valentine HT. 2000. Bayesian synthesis for quantifying uncertainty in predictions from process models. Tree Physiology 20:415–419.
Haimes YY, Barry T, Lambert JH. 1994. When and how can you specify a probability distribution when you don’t know much? Risk Analysis 14:661–706.
Hart GW. 1995. Multidimensional Analysis: Algebras and Systems for Science and Engineering. New York: Springer Verlag.
Isbell D, Hardin M, Underwood J. 1999. Mars Climate Orbiter team finds likely cause of loss. Mars Climate Orbiter/Mars Polar Lander News & Status Release 99-113 (dated September 30, 1999) http://mars.jpl.nasa.gov/msp98/news/mco990930.html.
Jaynes ET. 1957. Information theory and statistical mechanics. Physical Review 106:620–630.
Kassenborg H, Hedberg C, Hoekstra M, Evans MC, Chin AE, Marcus R, Vugia D, Smith K, Desai S, Slutsker L, Griffin P, and the FoodNet Working Group. 2001. Farm visits and undercooked hamburgers as major risk factors for sporadic Escherichia coli O157:H7 infections—Data from a case-control study in five FoodNet sites. Manuscript in preparation.
Lee RC, Wright WE. 1994. Development of human exposure-factor distributions using maximum-entropy inference. Journal of Exposure Analysis and Environmental Epidemiology 4:329–341.
Levine RD, Tribus M. 1976. The Maximum Entropy Formalism. Cambridge: MIT Press.
Marks HM, Coleman ME, Lin CTJ, Roberts T. 1998. Topics in risk assessment: Dynamic flow tree process. Risk Analysis 18:309–328.
Meyer MA, Booker JM. 1991. Eliciting and Analyzing Expert Judgment: A Practical Guide. New York: Academic Press.
Morgan MG, Henrion M. 1990. Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge: Cambridge University Press.
Nauta MJ, Evers EG, Takumi K, Havelaar AH. 2001. Risk assessment of Shiga-toxin producing Escherichia coli O157 in steak tartare in the Netherlands. Rijksinstituut voor Volksgezondheid en Milieu (RIVM) . Report 257851 003.
Shinagawa K. 1997. Correspondence and problem for enterohemorrhagic E. coli O157 outbreak in Morioka city, Iwate. Koshu Eisei Kenkyu 46:104–112.
Small MJ, Fischbeck PS. 1999. False precision in Bayesian updating with incomplete models. Journal of Human and Ecological Risk Assessment 5(2):291–304.
Tilwari JL, Hobbie JE. 1976. Random differential equations as models of ecosystems. II. Initial condition and parameter specification in terms of maximum entropy distributions. Mathematical Biosciences 31:37–53.
Warren-Hicks WJ, Moore DRJ. 1998. Uncertainty Analysis in Ecological Risk Assessment. Pensacola, FL.: SETAC Press, Chapter 3.