National Academies Press: OpenBook
« Previous: 3 Planning Assessments
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

4

Study Evaluation

The committee reviewed “Study Evaluation,” Chapter 6 of the ORD Staff Handbook for Developing IRIS Assessments (the handbook) (EPA, 2020a). The committee’s review of that chapter considered the adequacy of the study evaluation methods presented for individual human studies (epidemiology and controlled exposure), animal studies, mechanistic evidence (pilot testing approaches), and pharmacokinetic (PK) models. It also considered how those methods might be improved (Question 3 in Appendix B).

OVERVIEW OF THE HANDBOOK’S MATERIAL ON STUDY EVALUATION

Chapter 6 of the handbook describes approaches for evaluating individual studies for their validity and utility in assessing a health effect associated with specific exposure/endpoint relationships, with primary focus on study utility for hazard identification. The focus is on human epidemiological and experimental animal toxicology studies, with additional considerations for controlled human exposure studies, computational PK models and physiologically based pharmacokinetic (PBPK) models, and data relevant to mechanisms of toxicity. The stated goal of study evaluation is to evaluate “the extent to which the results are likely to represent a reliable, sensitive, and informative presentation of a true response” (EPA, 2020a, p. 6-2). While standard methods for systematic review would consider only risk of bias when evaluating individual studies (Higgins and Thomas, 2019), study evaluation in the handbook is intentionally broadened to include the additional study evaluation domains of “sensitivity” and “reporting quality.” It is not standard practice to include the concepts of sensitivity or reporting quality as part of the evaluation of individual studies included in systematic reviews of human research, although these domains are sometimes part of study evaluation for systematic reviews of animal studies (SYRCLE [Hooijmans et al., 2018). For the human studies, evaluation for risk of bias and sensitivity follows the Risk of Bias in Non-randomized Studies of Interventions (ROBINS-I) framework (Sterne et al., 2016), with a modification to include a domain for sensitivity and accommodate types of studies typically encountered in environmental and occupational epidemiology. Human studies also consider reporting quality as a contributing feature to the risk of bias and sensitivity domain judgments. Evaluation of animal toxicological studies assess risk of bias, sensitivity, and reporting quality as separate study evaluation concerns. Individual evaluation domains differ slightly to accommodate common differences across human, animal, and controlled exposure studies, but in all cases domain-specific judgments are combined toward an “overall rating” of study confidence that summarizes the evaluation for a specific exposure/endpoint relationship from a given study as “high,” “medium,” “low,” “or “uninformative.” Such rating guides the relative

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

weight that each study will receive in downstream hazard identification, with “low confidence” studies generally not used as primary sources of information, and “uninformative” studies not considered further in evidence synthesis or integration. As mentioned previously, Chapter 6 of the handbook states on page 6-4 that study evaluations are “generally” conducted independently by at least two reviewers, which leaves some indication of an undesirable possibility that only one reviewer would evaluate a study.

RESPONSIVENESS TO PREVIOUS NATIONAL ACADEMIES REPORTS

The committee finds that the handbook is responsive to many of the recommendations provided in the 2014 National Academies report Review of EPA’s Integrated Risk Information System (IRIS) Process relative to evidence evaluation, including that the U.S. Environmental Protection Agency (EPA) continue to adapt or develop tools for assessing risk of bias in different types of studies, that risk of bias assessments should be incorporated into the IRIS process and carried forward to the evidence integration step, and that EPA should publish its risk of bias assessments. The 2014 report noted that EPA could augment risk of bias considerations with additional “quality assessment items” that are relevant to a particular systematic review if they are empirically based, as may be the case in the handbook with consideration of sensitivity and reporting quality.

The handbook’s use of study evaluation ratings to exclude studies from further consideration is only partially concordant with the 2014 report, and recommended practice for using study evaluation to exclude studies prior to data extraction has evolved since the 2014 report. The 2014 report stated, “The risk-of-bias assessment can be used to exclude studies from a systematic review or can be incorporated qualitatively or quantitatively into the review results. The plan for incorporating the risk-of-bias assessment into a systematic review should be specified a priori in the review protocol” (NRC, 2014, p. 76). It was a clear expectation, however, that study exclusion be a rare and transparent process and only studies with severe methodological shortcomings (“fatal flaws”) be excluded. Importantly, the 2014 report’s recommendations pertained to the exclusion of studies based on risk of bias assessment and not on additional quality assessment items such as sensitivity and reporting quality. The handbook’s reliance on these additional quality assessment items, and their possible use for study exclusion, represents a potential departure from the recommendations of the 2014 report. In addition, other National Academies reports that followed the 2014 report have recommended that EPA not use the risk of bias analysis to exclude studies. For example, the 2021 National Academies report The Use of Systematic Review in EPA’s Toxic Substances Control Act Risk Evaluations explicitly recommends that EPA “not exclude studies based on risk of bias, study quality, or reporting quality” (NASEM, 2021, p. 40).

Because financial ties of the investigators can be strongly associated with favorable outcomes for the sponsors, even when taking into account other risks of bias, NRC (2014) also recommended that funding sources be considered in the risk-of-bias assessment for individual studies included in systematic reviews that are part of an IRIS assessment. That recommendation was based largely on evidence obtained from clinical studies because less was known at that time about the extent of funding bias in animal research. Since the

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

2014 report was published, evidence for funding bias in both the preclinical human and animal literature has increased, as discussed in Chapter 2 of this report. This recommendation has not been addressed.

CRITIQUE OF METHODS FOR STUDY EVALUATION

The committee’s critique of study evaluation methods in the handbook focuses mainly on human epidemiological and experimental animal toxicology studies. Evaluations of PBPK and PK models and data relevant to mechanisms of toxicity are discussed at the end of this critique section. The objectives of handbook Chapter 6 are clear and represent an important step in the IRIS assessment process. Furthermore, study evaluation is a critical step in systematic review methods. While much of the study evaluation process is clear and well defined, the committee identifies two major areas where a lack of clarity in the handbook is likely to inhibit operationalization, transparency, and reproducibility of the study evaluation process. The first area relates to the description of how studies evaluated to have certain confidence ratings will be used in future steps of the IRIS assessment. The second, more substantive, area relates to a lack of clear delineations among the separate concepts of risk of bias, reporting quality, study sensitivity, and other standard concepts in systematic review. The lack of clarity is particularly complicated in the context of IRIS assessments, due to the apparent differences in how study evaluation has historically occurred for human studies and animal and toxicological studies. While Thayer (2021) and EPA’s written responses to the committee’s questions provided important clarifications in both of these areas and left the committee confident that IRIS staff are evaluating studies appropriately, deficiencies and ambiguities in the text of the handbook threaten the ability to guide transparent, credible reviews of the evidence.

How Study Confidence Ratings Are Used Throughout the IRIS Assessment Process

The handbook explicitly states that study evaluation is performed in the context of a study’s utility for identification of individual hazards, rather than the usability for dose-response analysis. The committee notes confusion about how studies’ confidence ratings will be used toward this purpose. A major concern is that studies that are judged as “critically deficient” or “deficient” in one evaluation domain are typically rated as “uninformative” or “low confidence” studies that are generally not considered further in the IRIS assessment. The text anticipating this apparent exclusion of certain studies from future consideration based on their study evaluation determination (p. 6-6 of the handbook) does not represent a practicable, repeatable process. The text of the handbook addresses which confidence judgments correspond to more or less weight in later evidence synthesis and integration, acknowledging that such a determination may depend on what other studies on the exposure/endpoint are available and allowing for additional consideration on the basis of whether a confidence judgment results from sensitivity or other concerns. This handbook text gives the impression that certain studies are excluded from future review in the IRIS assessment process based solely on the individual study evaluation rating, when in practice such determinations are reached at other points in the process. During

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

the presentation by Thayer (2021) and in written responses to the committee’s questions, EPA clarified that “deficient” and “critically deficient” judgments can be quite varied and that “low confidence” studies are not typically excluded. However, EPA provided data from recent IRIS assessments showing that the proportion of human studies rated as “uninformative” and excluded from further consideration ranged from 0 to 50 percent, and 0 to 41.5 percent for animal studies. Thus, depending on the IRIS assessment, excluding studies at the study evaluation stage could lead to a substantial proportion of excluded studies due to a critically deficient rating in one domain. The importance of robust and transparent information, properly contextualized within the framework of the entire systematic review, suggests, as recommended in NASEM (2021), that study evaluation ratings should not be used to exclude studies. “Organizing the hazard review” or “evidence synthesis” are two possible process points for a decision about study exclusion, with the exclusion or practical allocation of zero or nonzero weight to any given study being determined in the context of information available across studies.

Risk of Bias and Other Quality Assessment Items (Sensitivity and Reporting Quality)

Standard procedures for systematic review, which have a longer history in human clinical studies than in animal toxicological studies, evaluate studies on the basis of risk of bias and internal validity (Higgins and Thomas, 2019). The handbook augments study evaluation to consider, in both human and animal studies, the additional quality assessment items of sensitivity and reporting quality. The inclusion of these additional quality assessment items was anticipated in NRC (2014) and has precedent in animal toxicological studies (Hooijmans et al., 2018; NTP, 2019). However, the distinctions among the concepts of risk of bias, reporting quality, and sensitivity are not always clearly delineated in the handbook, and some considerations have clear overlap with more established concepts in systematic review. This mixing of concepts is prone to inhibit transparency and reproducibility in documenting how each concept is incorporated into domain judgments and study confidence ratings. Operationally, these concepts are sometimes considered collectively when evaluating domains, complicating an ultimate evaluation where elements of one concept can be incorporated into a judgment in multiple domains and be simultaneously cited as their own evaluation concerns. As such, it is difficult to see exactly how the procedure in the handbook would avoid “double counting” a single issue in multiple domains. Some concepts could conceivably be considered outside the study evaluation portion of the systematic review. Differences between the description of these concepts in the human epidemiological studies and the animal toxicological studies suggests two distinct study evaluation processes, yet both use the same terminology which furthers the confusion.

Risk of Bias

Of the three concepts, risk of bias and its attendant evaluation procedures show the most commonality and a high degree of concordance with other documented processes

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

for systematic review. Reliance on the ROBINS framework for human studies has a clear precedent, although the committee notes there are documented difficulties in applying this framework in practice (Minozzi, et al., 2019; Igelström et al., 2021). The handbook’s adaptations to address the types of studies typically encountered in environmental and occupational epidemiology are appropriate. IRIS program staff members are actively engaged in the development of a Risk of Bias in Non-randomized Studies of Exposures (ROBINS-E) (Morgan et al., 2017) counterpart to the existing ROBINS-I, as evidenced in the adaptations that appear in the handbook. Participation in this ongoing development renders IRIS staff well positioned to implement any eventual documented ROBINS-E framework, but the evolving nature of this framework is noteworthy, and limitations to the ROBINS-E framework have been noted (Bero et al., 2018).

Sensitivity

The handbook discusses the purportedly distinct concept of sensitivity as the ability of a study to detect a true association, with insensitive studies prone to produce “false negative” results. Low sensitivity in both the animal toxicological and human epidemiological studies is generally described as “bias toward the null,” which might arise due to certain types of exposure measurement error, and also in terms that relate directly to study precision, such as sample size, which would represent an issue distinct from bias. The committee viewed this notion of sensitivity as a less established construct than others used in systematic review, noting that it actually touches on several more established concepts, including (but not limited to) precision, risk of bias, generalizability, and other study features that relate to design or choice of endpoints that might be evaluated relative to refined population, exposure, comparator, outcome (PECO) statements (Higgins and Thomas, 2019). The Cochrane Handbook notes, “Bias should not be confused with imprecision. Bias refers to systematic error, meaning that multiple replications of the same study would reach the wrong answer on average. Imprecision refers to random error, meaning that multiple replications of the same study will produce different effect estimates because of sampling variation, but would give the right answer on average” (Higgins and Thomas, 2019, p. 179). Precision is related to the sample size of the study and confidence interval around the effect estimate, and a study that lacks precision may produce a “false negative” by virtue of random (sampling) variability. Thus, a null result due to lack of precision should not be described as “bias toward the null” as though it were a study feature (e.g., exposure measurement error) that might produce systematic error toward the null.

The Cochrane Handbook also notes, “Bias should not be confused with the external validity of a study, that is, the extent to which the results of a study can be generalized to other populations and settings” (Higgins and Thomas, 2019, p. 179). A study judged as insensitive due to the choice of population features (or exposure levels or follow-up periods) could produce what the handbook describes as a “false negative” by virtue of a lack of generalizability between the results of the study and the population (or exposure levels or follow-up periods) for which the risk assessment is of primary interest. That is, characteristics such as populations (or exposure levels or follow-up periods) dictate the precise exposure-response association that a particular study may address, and key questions

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

about how that particular exposure-response association relates to the goal of the risk assessment should not be confused with “bias toward the null” or study quality, more generally. For example, EPA may posit a critical age at exposure, susceptible subgroup, exposure regimen of primary concern, window of exposure, and endpoints of interest. The systematic review may then identify a set of studies that address the agent of interest and outcome of interest, but classify a study as “low sensitivity” if it does not evaluate the ages, subgroups, exposure patterns, or temporal periods of effect that EPA has hypothesized are of most interest. Such a study may not be biased, and it may be adequately powered to detect the particular exposure-response association under study; yet EPA may still characterize it as “low sensitivity.” In such cases, EPA has implicitly changed the question being evaluated—from one that is general (“Does the agent affect the endpoint?”) to one that is narrower (“Does this study evaluate the effect of the agent on the endpoint in the period of follow-up, when we posit the effect will be largest?”). Judging a study as “low sensitivity” due to these types of study features should not downgrade the study’s evaluation with regard to hazard identification but rather should be judged as addressing the agent’s impact under a defined exposure scenario for a defined period of observation. A study designated as “low sensitivity” may provide useful evidence of effect heterogeneity (across different populations or exposure levels) or offer supporting evidence for other studies that consider more relevant exposure duration or follow-up periods (similar to the concept of a “negative control”). How exactly a particular exposure-response association being evaluated in a given study relates to the goals of the risk assessment could be captured through the study’s relationship with the PECO statement, provided that such a PECO statement had been refined from the original broad PECO statement posited in the problem formulation stage. Insofar as some of the human epidemiological or animal toxicological studies may be more or less relevant for the IRIS assessment, a general judgment of “whether there are factors in the design and conduct of the study that may reduce its ability to an observe an effect, if present” (EPA, 2020a, Table 6-1, p. 6-2) may not fully capture the different ways in which studies that target varying components of an exposure-response association might contribute to hazard identification. The systematic review infrastructure offers the opportunity to consider such studies in relation to the broad and refined PECO statements when judging relevance for the risk assessment, rather than use their features to determine sensitivity or study quality.

For evaluation of human epidemiological studies, the handbook briefly notes the concept of defining “an ‘ideal’ design (i.e., a study design with no risk of bias and high sensitivity) for the review question” (EPA, 2020a, p. 6-9) to be used as a reference point in gauging domain-specific ratings. The committee acknowledges the potential controversy surrounding the use of the “ideal design” in observational epidemiology when a randomized controlled trial is designated as the ideal. However, the committee emphasizes that while this concept is most typically used to assess risk of bias, it also encompasses (as noted on p. 6-9 of the handbook) study features, such as population definition, exposure window, and follow-up period. As such, the concept of an “ideal design” could potentially play a more explicit role in the handbook for clarifying how the particular exposure-response association being investigated in a given study relates to the “ideal”

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

exposure-response association defined by a particular population of interest, exposure window, follow-up period, or other considerations.

Finally, some aspects of sensitivity discussed in the animal toxicological studies were viewed as overlapping with notions of reporting quality, possibly due to their inherently subjective nature. For example, considerations described as “outcome measures and results display” within the sensitivity domain include prompting questions such as “Does the level of detail allow for an informed interpretation of the results” and listed general considerations include “use of unreliable methods,” which would both point toward considerations that could be considered in the reporting quality evaluation. Clear, concrete criteria and guidelines for each domain of study sensitivity would emphasize delineation with other study quality concepts and improve transparency and operationalization. Given that it only takes a single “critically deficient” element to be deemed “uninformative,” the criteria for what constitutes a “critical deficiency” need to be unambiguously stated.

While many of the components of study sensitivity discussed in the handbook are important for study evaluation, the lack of alignment with other more established concepts in systematic review inhibits transparency of the study evaluation. While the handbook is clear that study features concerning sensitivity that may be evaluated in other bias domains should not be “double counted” in the sensitivity domain, a lack of guidance on how to allocate judgments about these features across multiple domains hinders operationalization and repeatability of the study evaluation process. A more focused description of the elements of sensitivity that are not overlapping with other parts of the systematic review process or other more established systematic review concepts would aid operationalization of the handbook. In addition to consideration of sensitivity relative to concepts such as precision and generalizability, concepts such as “directness and applicability” (NTP, 2019), judgments of a study’s relation to PECO, or more concrete descriptions of the notion of an “ideal design” in the ROBINS framework (for the human epidemiological studies) would provide a more transparent way to document how a study may or may not contribute to ultimate risk assessment.

Overall, the handbook’s discussion of sensitivity addresses many concepts that are important for study evaluation, but describing these distinct concepts with the term “sensitivity” is ambiguous and leads to a definition of sensitivity that is potentially overlapping with more established concepts of study evaluation and systematic review, as well the potential for some elements of study sensitivity to be evaluated relative to PECO statement(s). Consideration of these concepts is essential, but the ambiguity and overlap of concepts hinder operationalization, repeatability, and transparency of the study evaluation procedures. Issues relating to study precision and generalizability in particular might be appropriately separated from evaluation of study quality or bias. The committee acknowledges that there may be aspects of what is currently described as study sensitivity that may not be naturally captured in other systematic review concepts. Examples from Table 6-9 (p. 6-32) of the handbook that might warrant isolation as quality assessment items separate from risk of bias, precision, or generalizability include chemical administration and characterization and results presentation. If quality assessment items require isolation from other systematic review concepts, these aspects warrant a precise definition alongside an explicit procedure for operationalizing their consideration as distinct elements of

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

study evaluation. References to such concepts in the literature on environmental health and other disciplines would strengthen the rationale for their use as distinct concepts in the systematic review context. Should EPA maintain study evaluation concepts that are currently described as sensitivity considerations and cannot be captured with more established terminology for systematic review, the use of a word other than “sensitivity” to describe these necessary aspects would avoid confusion with several other technical and colloquial meanings of the word “sensitivity.”

Reporting Quality

The final quality assessment item considered in Chapter 6 of the handbook is reporting quality, which receives strikingly different discussion between the human epidemiological and animal toxicological studies. Inclusion of reporting quality as a quality assessment item is atypical in systematic review protocols but, as with sensitivity, has more recently emerged in systematic reviews of animal toxicological studies (SYRCLE [Hooijmans et al., 2018 and NTP, 2019). In the human studies, reporting quality is to be considered alongside risk of bias in each domain, whereas in the animal studies reporting quality is explicitly identified as a separate domain. The detail around reporting quality in animal toxicological studies is not mirrored in the discussion of human epidemiology studies. The committee appreciates the potentially broader set of factors that may dictate reporting quality in epidemiological versus experimental toxicological studies, but it questions the apparent guidance that notions of reporting quality require such different treatment in the human epidemiological versus animal toxicological studies. If there are specific reasons why reporting quality requires such a different role in these different types of studies, they are not made explicit in the handbook. The handbook does not provide sufficient guidance as to how reporting quality should be used to inform judgments in the risk of bias domains in the human studies, and the apparent subjectivity in this assessment inhibits transparency and operationalization.

The handbook text on experimental animal toxicology studies indicates that the IRIS program considers reporting quality to be a first consideration of whether the study has reported sufficient details to conduct a risk of bias and sensitivity assessment, noting that studies that do not report basic information are typically rated as “uninformative.” The criteria for evaluating reporting quality are unambiguous and what would constitute a “critically deficient” judgment in the reporting quality domain is clear, logical, and consistent with prior National Academies reports (Table 6-9 of the handbook, p. 6-32). As with some components of study sensitivity, the committee identifies opportunities for elements of reporting quality to have been considered at other points in the systematic review, for example, in relation to the PECO statement, where an inability to discern (e.g., due to poor reporting quality) the animal species, sex, or exposure route or duration might lead to exclusion of a study for an inability to judge its fitness with the PECO statement, much less evaluate the study for risk assessment. In addition, and also in common with the sensitivity quality assessment, the committee identifies what could be considered overlap between some of the cited features of reporting quality and other considerations that are more commonly regarded as risk of bias domains. For example, the risk of bias

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

assessment could conceivably be expanded to include domains for “outcome ascertainment” and “participant selection” (as is done in the human epidemiological studies), in which case issues related to reporting on the assays or procedures used to measure the endpoint or the species or sex of the animal could be naturally subsumed within a more standard risk of bias assessment.

Overall, the committee notes the usefulness of the specific reporting quality elements described in the critical information for animal toxicological studies, but it could not identify why a similar level of detail was absent from the discussion of human studies or why some concepts such as test animal and exposure methods required consideration outside of traditional risk of bias domains or PECO considerations. A transparent and repeatable review process requires either explicit guidance for how to incorporate reporting quality into individual risk of bias domains or an explicit rationale for two separate processes for assessing reporting quality in human and animal study evaluation. In either case, explicit rationale for isolating elements of reporting quality from established systematic review concepts would aid operationalization of the study evaluation process.

Co-exposures

The committee recognizes the difficulty and evolving landscape of methodology for handling co-exposures in environmental and occupational epidemiology and animal toxicology, which might include chemical mixtures or non-chemical stressors. The handbook confines this discussion to the confounding domain in human epidemiological studies, but broader considerations are evolving in the literature. Consequences of co-exposures for risk assessment could include (but not be limited to) (1) multicolinearity and confounding of the health effect associated with the primary exposure, synergy, or interaction such that the health effect of one agent may be altered by the presence of a co-exposure; and (2) generalizability, where changes in co-exposures across populations hinder the transportability of results across populations. The near ubiquity of complex exposure mixtures warrants future consideration in refinement of study evaluation processes for IRIS assessment as this methodological area continues to evolve.

Study Evaluation Methods for PK Models, PBPK Models, and Mechanistic Evidence

Section 6.5 of the handbook indicates that a PK model or PBPK model must be evaluated before it can be accepted for use in an IRIS assessment. It appears that the evaluation would be carried out in parallel with, rather than as part of, a systematic review. Scientific and technical criteria for judging the suitability of a model include evaluating:

  • Representation of a chemical mode of action by the model’s structure and equations, based on available scientific information,
  • Availability of the computer code and apparent completeness of parameter listing and documentation,
  • Implementation of the conceptual model in the computational code,
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
  • Use of appropriate parameters in the model, and
  • Reproducibility of model results reported in scientific publications.

The committee considers the model evaluation approach described in the handbook to be adequate.

Mechanistic Evidence

The pilot testing approach for mechanistic data is described in Section 6.6 of the handbook. The committee considers that approach to be reasonable, as it is analogous to the approach for evaluation of animal toxicological studies (see handbook Table 6-10, beginning on p. 6-48). The issues identified with respect to sensitivity in animal studies are also relevant to the methods for evaluation of mechanistic evidence, and need to be considered accordingly.

FINDINGS AND RECOMMENDATIONS

Findings and Tier 1 Recommendations

Finding: The handbook describes circumstances under which a study may be excluded from the systematic review based on the outcome of the study evaluation. However, such exclusion is inconsistent with recent recommendations to incorporate study evaluation ratings within the context of evidence synthesis. For example, see the 2021 National Academies report The Use of Systematic Review in EPA’s Toxic Substances Control Act Risk Evaluations and the Cochrane Handbook for Systematic Reviews of Interventions.

Recommendation 4.1: The handbook should not use the results of study evaluation as eligibility criteria for the systematic review. [Tier 1]

Finding: The quality assessment item described as “sensitivity” covers important concepts, but is ambiguous and under-operationalized, as it covers aspects of internal validity, external validity, and statistical precision that overlap with other more commonly accepted features of systematic review that may be better assessed at other stages of the systematic review process.

Recommendation 4.2: EPA should evaluate whether aspects currently captured in the notion of “sensitivity” might be better described in the handbook with more established terminology (e.g., precision or generalizability) or better addressed at other points of the systematic review (e.g., risk of bias assessment or evaluation relative to PECO statement[s]). Otherwise, the handbook should provide a more concrete definition of “sensitivity” and a procedure for operationalizing its use in the study evaluation step. [Tier 1]

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

Finding: The use of reporting quality as a distinct quality assessment item for study evaluation is not standard for systematic reviews, and procedures for evaluating reporting quality are very different for the human epidemiological and animal toxicological studies. Reporting quality is included within other evaluation domains for human studies, with almost no specific guidance on how to incorporate reporting quality within these domains. This presents the possibility of downgrading a study quality rating without any specific evaluation criteria. For animal studies, reporting quality is described in detail as a separate evaluation domain but with some aspects that overlap with other systematic review concepts such as risk of bias or external validity. The handbook notes that reviewers should reach out to study authors to obtain missing information in animal toxicological studies, but it provides no such reference to obtaining information from authors of human epidemiological studies.

Recommendation 4.3: The handbook should address the apparent difference in assessing reporting quality between the human epidemiological and animal toxicological studies by either (1) assessing reporting quality similarly in both types of studies or (2) providing an explicit rationale for why the concepts require different assessment procedures in different types of studies. In either case, the handbook should provide an explicit rationale for isolating elements of reporting quality from established systematic review concepts and evaluate whether aspects currently described as reporting quality might be better addressed at other points of the systematic review process. [Tier 1]

Finding and Tier 2 Recommendation

Finding: Operationalizing the steps for conducting the study evaluation is opaque from the text of the handbook but greatly clarified by presentation of how many steps are operationalized with the Health Assessment Workspace Collaborative (HAWC).

Recommendation 4.4: EPA should redraft the handbook to harmonize descriptive text with illustrations of how steps for conducting the study evaluation are operationalized in practice, using the presented material as needed. It is evident that all IRIS assessments will be performed using HAWC or similar software. Screen shots illustrating key steps would greatly improve conceptual understanding and operationalization (similar to Figures 6-2 and 6-3 of the handbook). Additional step-by-step instructions with screenshots could be included as supplementary material or links to other content. [Tier 2]

Finding and Tier 3 Recommendation

Finding: Consideration of how multiple co-exposures or chemical mixtures may relate to assessing risk of bias or other quality assessment items is limited to a discussion of potential confounding, which may not fully capture the possible impact on systematic review.

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×

Recommendation 4.5: EPA should monitor ongoing methodological development in assessing risk amid environmental co-exposures or mixtures in order to update explicit guidance on their potential roles in the evaluation of human epidemiological and animal toxicological studies. [Tier 3]

Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 52
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 53
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 54
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 55
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 56
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 57
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 58
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 59
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 60
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 61
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 62
Suggested Citation:"4 Study Evaluation." National Academies of Sciences, Engineering, and Medicine. 2022. Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version. Washington, DC: The National Academies Press. doi: 10.17226/26289.
×
Page 63
Next: 5 Evidence Synthesis »
Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version Get This Book
×
 Review of U.S. EPA's ORD Staff Handbook for Developing IRIS Assessments: 2020 Version
Buy Paperback | $25.00 Buy Ebook | $20.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The U.S. Environmental Protection Agency's (EPA) Integrated Risk Information System (IRIS) program develops human health assessments that focus on hazard identification and dose-response analyses for chemicals in the environment. The ORD Staff Handbook for Developing IRIS Assessments (the handbook) provides guidance to scientists who perform the IRIS assessments in order to foster consistency in the assessments and enhance transparency about the IRIS assessment process. At the request of the EPA, this report reviews the procedures and considerations for operationalizing the principles of systematic reviews and the methods described in the handbook for determining the scope of the IRIS assessments, evidence integration, extrapolation techniques, dose-response analyses, and characterization of uncertainties.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!