The Integrated Risk Information System (IRIS) is a program within the US Environmental Protection Agency (EPA) that is responsible for developing toxicologic assessments of environmental contaminants. IRIS assessments contain hazard identifications and dose-response assessments of various chemicals that cover cancer and noncancer outcomes. Although the program was created to increase consistency among toxicologic assessments within the agency, other federal agencies, various state and international agencies, and other organizations have come to rely on IRIS assessments for setting regulatory standards, establishing exposure guidelines, and estimating risks to exposed populations. Over the last decade, the National Research Council (NRC) has been asked to review some of the more complex and challenging IRIS assessments, including those of formaldehyde, dioxin, and tetrachloroethylene. In 2011, an NRC committee released its review of the IRIS formaldehyde assessment. Like other NRC committees that had reviewed IRIS assessments, the formaldehyde committee identified deficiencies in the specific assessment and more broadly in some of EPA’s general approaches and specific methods. Although the committee focused on evaluating the IRIS formaldehyde assessment, it provided general suggestions for improving the IRIS process and a roadmap for its revision in case EPA decided to move forward with changes to the process.
After release of the formaldehyde report, Congress held several hearings to examine the IRIS program. The House Report (112-151) that accompanied the Consolidated Appropriations Act of 2012 (Public Law 112-74) stated that “EPA shall incorporate, as appropriate, based on chemical-specific datasets and biological effects, the recommendations…of the National Research Council’s Review of the Environmental Protection Agency’s Draft IRIS Assessment of Formaldehyde into the IRIS process.” To ensure that EPA adequately considers the recommendations, Congress requested that NRC assess the scientific, technical, and process changes being implemented or planned by EPA and recommend modifications or additional changes as appropriate to improve the scientific and technical performance of the IRIS program. This committee, the Committee to Review the IRIS Process, was convened by NRC as a result of that request. In addition to reviewing the changes in the IRIS program, the committee was asked to review current methods for evidence-based reviews and recommend approaches for weighing scientific evidence for chemical hazard and dose-response assessments. The present report provides the committee’s review and recommendations, which are organized around the general depiction of the IRIS assessment process shown in Figure S-1.
In 2011, the same year that the NRC formaldehyde report was released, the Institute of Medicine (IOM) released a report that recommended standards for systematic review.1 As defined by IOM, systematic review “is a scientific investigation that focuses on a specific question
1IOM (Institute of Medicine). 2011. Finding What Works in Health Care: Standards for Systematic Reviews. Washington, DC: The National Academies Press.
and uses explicit, prespecified scientific methods to identify, select, assess, and summarize the findings of similar but separate studies.” Although the IOM report was written in the context of comparative-effectiveness research, which aims to determine the most appropriate evidence-based course of action in the clinical setting, systematic-review methods have been used for decades in fields as varied as agriculture and education. The materials and examples provided by EPA to the present committee indicate that the agency is also incorporating systematic-review principles as it implements changes in the IRIS process. The committee agrees with EPA that the systematic-review standards provide an approach that would substantially strengthen the IRIS process, and the committee uses them as a reference point to evaluate the changes that EPA has made.
In evaluating the literature, NRC reports, and EPA documents, the committee found that systematic review and weight-of-evidence analysis have historically been described in various ways, and the terms are sometimes used interchangeably; this vagueness in use of terminology results in some confusion as to what the terms mean in practice. In the context of IRIS, the committee has defined systematic review as including protocol development, evidence identification, evidence evaluation, and an analytic summary of the evidence (see Figure S-1). The committee views weight-of-evidence analysis as a judgment-based process for evaluating the strength of evidence to infer causation. However, it found that the phrase as used in practice has become too vague and is of little scientific use. An IRIS assessment must come to a judgment about whether a chemical is hazardous to human health and must do so by integrating a variety of lines of evidence. Therefore, the committee found the term evidence integration to be more useful and more descriptive of the process that occurs after completion of systematic reviews.
The NRC formaldehyde report made several general recommendations concerning the IRIS process, including improving the clarity of the assessments by rigorous editing to reduce redundancies, inconsistencies, and text volume; describing assessment methods more fully; enhancing quality-control processes for assessments; standardizing review and evaluation approaches; and ensuring appropriate expertise on the various chemical-assessment teams. In response to the recommendations, EPA has implemented a new document structure that streamlines the assessments, added a standard preamble to all assessments that describes the IRIS process and its underlying principles, drafted a handbook that provides a more detailed description of the IRIS process, formed chemical assessment support teams (CASTs) to oversee the assessment-development process and ensure consistency among assessments, established tracking procedures, and implemented several initiatives to increase stakeholder input.
FIGURE S-1 Systematic review in the context of the IRIS process. The committee views public input and peer review as integral parts of the IRIS process, although they are not specifically noted in the figure.
Overall, the changes that EPA has proposed and implemented to various degrees constitute substantial improvements in the IRIS process. If current trajectories are maintained, inconsistencies identified in the present report are addressed, and objectives still to be implemented are successfully completed, the IRIS process will become much more effective and efficient in achieving the program’s basic goal of developing assessments that provide an evidence-based foundation for ensuring that chemical hazards are assessed and managed optimally.
Specifically, the present committee finds that the new document structure improves the organization of and streamlines the assessments and reduces redundancies. EPA’s use of evidence tables and graphic displays has also reduced text volume and enhanced clarity and transparency. The new approaches bring IRIS assessments much more into line with the state of practice for systematic reviews. The preamble is a useful statement, which will presumably be updated as methods and procedures are modified and updated, but it does not substitute for an overview that indicates how the general principles in the preamble have been applied in any given assessment. The handbook is critical for providing consistency among the assessment teams and contributors, and the final version should be peer-reviewed to ensure that the document is on target and provides the needed guidance.
The committee is encouraged by the efforts to strengthen the overall scientific expertise in the assessment process through the addition of the CASTs and recommends that IRIS assessments clearly identify the members of all teams involved in the development of any given assessment. To strengthen the process further, experts from outside EPA and the government might be needed to fill gaps in expertise in specific areas. Experts should be engaged when needed to augment teams and to conduct peer review of the draft and final assessments.
Finally, the committee applauds EPA initiatives to involve stakeholders in the IRIS process earlier and more fully. Those initiatives are likely to improve assessment quality and to strengthen the program’s credibility. However, not all stakeholders who have an interest in the IRIS process have the same scientific or financial resources to provide timely comments, and expanded opportunities for stakeholder involvement might lead to a further imbalance of public input. Therefore, similar to other EPA technical-assistance programs, EPA should consider ways to provide technical assistance to under-resourced stakeholders to help them to develop and provide input into the IRIS process.
PROBLEM FORMULATION AND PROTOCOL DEVELOPMENT
As noted, EPA is incorporating principles of systematic review as it revises the IRIS process. Critical elements of conducting a systematic review include formulating the specific question that will be addressed (problem formulation) and developing the protocol that specifies the methods that will be used to address the question (protocol development). Although the NRC formaldehyde report did not provide any specific recommendations regarding those elements, the present committee found that some discussion of them is warranted.
A major challenge for EPA in the problem-formulation step is to determine what adverse outcomes should be evaluated in a specific IRIS assessment. The committee suggests a three-step process for conducting problem formulation. First, with the support of an information specialist who is trained in conducting systematic reviews, EPA should perform a broad literature search designed to identify possible health outcomes associated with the chemical under investigation. The broad search should not be confused with the comprehensive literature search that is conducted for evidence identification in a systematic review (see Figure S-1); some EPA materials do not sufficiently distinguish between the two. Second, a table should be constructed to guide the formulation of specific questions that would be the subjects of specific systematic reviews. The table could be organized by the lines of evidence typically available to EPA (human, animal, and mechanistic studies) and the various health outcomes to investigate. Third, the table should be examined to determine which outcomes warrant a systematic review and how to define the systematic-review question, such as, Does exposure to chemical X result in neurotoxic effects?
Decisions as to which outcomes should be further evaluated by systematic reviews require careful consideration of numerous factors, and the decision process should be documented and reviewed by relevant experts.
After the systematic-review questions are specified, protocols for conducting the systematic reviews to address the questions should be developed. A protocol makes the methods and the process of the review transparent, can provide the opportunity for peer review of the methods, and stands as a record of the review. It also minimizes bias in evidence identification by ensuring that inclusion of studies in the review does not depend on the studies’ findings. Any changes made after the protocol is in place should be transparent, and the rationale for each should be stated. EPA should include protocols for all systematic reviews conducted for a specific IRIS assessment as appendixes to the assessment.
The NRC formaldehyde report provided several suggestions aimed at improving EPA’s approach to evidence identification, including establishing standard protocols, developing a template to describe the search approach, and using a database to capture study information and relevant quantitative data. Overall, the present committee finds that EPA has been responsive to those suggestions and has substantially improved its approach to evidence identification. Although the agency could not have been expected to incorporate the 2011 IOM standards for systematic review, the preamble, draft handbook, and recent IRIS assessments demonstrate that EPA is well on the way to adopting a more rigorous approach to evidence identification that, when fully implemented, is anticipated to meet standards for systematic review. A few specific findings and recommendations to strengthen the evidence-identification process are highlighted here.
First, searching for and identifying evidence are arguably critical steps in a systematic review, and using a standardized search strategy and reporting format is essential for evidence identification. Protocols for IRIS assessments should include a line-by-line description of the search strategy for each systematic-review question addressed in the assessment that is written in collaboration with information specialists trained in systematic-review methodology. The protocol should also explicitly state the inclusion and exclusion criteria for studies and provide the date of the search, the publication dates searched, and the roles of the various team members.
Second, replicability and quality control are critical for data management. Thus, EPA should have an information specialist trained in systematic-review methodology who reviews the proposed evidence-identification section of the protocol. The committee also encourages the use of at least two reviewers who work independently to screen and select studies, pending an evaluation of validity and reliability that might indicate whether multiple reviewers are needed. The multiple independent reviewers would need to use standardized procedures and forms.
Third, although the basic principles underlying the 2011 IOM standards are most likely relevant to IRIS assessments, EPA is encouraged to perform or support research that examines the applicability of the standards to the hazard and dose-response assessments underlying IRIS assessments.
The NRC formaldehyde report provided several recommendations regarding evidence evaluation. Briefly, the recommendations focused on standardizing the presentation of studies and evidence and on evaluating the studies with standardized approaches. In response, EPA now provides checklists in the preamble that indicate how the agency will assess the quality of epidemiologic and experimental studies. Additional details are provided in the draft handbook. EPA correctly identifies important study attributes that can be used to judge study quality but does not
describe how it will assess risk of bias in the identified studies. The committee notes that assessing the quality of the study is not equivalent to assessing the risk of bias in the study. An assessment of study quality evaluates the extent to which the researchers conducted their research to the highest possible standards and how a study is reported. Risk of bias is related to the internal validity of a study and reflects study-design characteristics that can introduce a systematic error (or deviation from the true effect) that might affect the magnitude and even the direction of the apparent effect. An assessment of risk of bias is a key element in systematic-review standards; potential biases must be assessed to determine how confidently conclusions can be drawn from the data.
The committee emphasizes the importance of assessing risk of bias for all study types. Although several approaches are described in the present report, the committee is not recommending the adoption of any specific approach. For a scientifically defensible method, however, EPA should select assessment tools for which empirical evidence links an assessment item with an associated risk of bias. Standardized methods might need to be developed, and EPA might need to conduct or support research on the development and evaluation of empirically based instruments for assessing bias in human, animal, and mechanistic studies relevant to chemical-hazard identification. It might want to consider pooling data across IRIS assessments to determine whether, among various contexts, candidate risk-of-bias items are associated with overestimates or underestimates of effect.
Incorporating risk-of-bias assessments into the IRIS assessment process might take some time, and approaches will depend on the complexity and extent of data on a chemical and the resources available to EPA. An important limitation of all existing tools for assessing study methods is that research reports might not include sufficient details to enable assessment. Consequently, EPA might be hampered by differences in reporting standards for some scientific literature, although the committee expects reporting of toxicology research to improve as risk-of-bias assessments are incorporated into the IRIS process. However, a coordinated effort by government agencies, researchers, publishers, and professional societies will be required to improve the completeness and accuracy of reporting toxicology studies in the near future. Regardless, a risk-of-bias assessment should be conducted on studies that are used by EPA as primary data sources for the hazard identification and dose-response assessment. Whatever approach is adopted, the assessment approach and the results should be fully described and reported in the IRIS assessment.
EVIDENCE INTEGRATION FOR HAZARD IDENTIFICATION
The NRC formaldehyde committee provided several recommendations regarding evidence integration, including reviewing the use of weight-of-evidence guidelines, standardizing an approach to using them, developing uniform language to describe the strength of evidence on noncancer effects, and providing more integrative and transparent discussions of weight of evidence. As in other recommendations, there is an emphasis on transparency and standardization of approach. In response, EPA has provided guidelines in the preamble for what considerations ought to inform the experts who are charged with integrating human, animal, and mechanistic evidence, and it gives extensive guidance on the qualitative categorization that the experts should use, but it articulates no systematic process by which the experts are to come to a conclusion. In the handbook, EPA provides extensive guidelines for synthesizing evidence within each category but no guidelines for integrating evidence among categories. The guidelines and the classification schemes offered for epidemiologic and other studies are reasonable, and similar ones have been used by other organizations with similar aims.
The committee appreciates that EPA’s improvements for evidence integration are still being developed but offers some options for moving forward. Several qualitative and quantitative options are available for overall evidence integration. Qualitative options include guided expert judgment, such as the approach used by the International Agency for Research on Cancer
(IARC) in which working groups are used to arrive at overall judgments regarding a chemical’s carcinogenicity, and a structured process in which explicit guidelines are developed for qualitative categorization and the process is made as algorithmic as is possible, such as one being developed by the National Toxicology Program (NTP) that is based on the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system. Quantitative options include meta-analysis, probabilistic bias analysis, and Bayesian analysis. Although meta-analysis and probabilistic bias analysis provide quantitative estimates of effect size, the key question in both cases would be whether the effect size can reasonably be inferred to exclude zero (or to exclude being negligible). If so, there is evidence that a hazard exists. If not, there is not adequate evidence to conclude that a hazard exists, although the evidence might suggest a hazard. Bayesian analysis can be used to derive a quantitative judgment, such as “there is at least a 60% chance that chemical X is a carcinogen.” Such quantitative judgments could be easily converted into qualitative categorical judgments on the basis of a scale of probabilistic certainty. Quantitative models for evidence integration are powerful tools that can address a wide array of scientific questions, but their clear downside is that model misspecification at any stage can result in incorrect inferences. Nevertheless, they create a rigorous approach that forces analysts to make their assumptions explicit in ways that less formal methods do not. Qualitative and quantitative options are described further in Chapter 6 of this report.
The committee is not recommending a particular approach but suggests that EPA consider which approach among the suggested options best fits its plans for the IRIS program. EPA, however, should continue to improve its evidence-integration process incrementally and enhance the transparency of its process. Thus, it should either maintain its current guided-expert-judgment process but make its application more transparent or adopt a structured (or GRADE-like) process for evaluating evidence and rating recommendations along the lines that NTP has taken. If EPA does move to a structured evidence-integration process, it should combine resources with NTP to leverage the intellectual resources and scientific experience in both organizations. Adopting a structured process would have the benefit of transparency. The committee emphasizes that quantitative approaches to integrating evidence will be increasingly needed and useful to EPA, and the agency should seriously consider expanding its ability to perform quantitative modeling for evidence integration.
Regardless of the approach, EPA should develop templates for structured narrative justifications of the evidence-integration process and the conclusion reached. Evidence integration is fundamental in determining whether a chemical poses a hazard. Consequently, the premises and structure of the decision-making process should be as explicit as possible, and the basis for the determination needs to be connected explicitly to the evidence tables produced in the IRIS process.
CALCULATION OF TOXICITY VALUES
In addition to hazard identification, IRIS assessments typically derive toxicity values—reference concentrations, reference doses, and unit risks—that can be used with exposure assessments to derive quantitative risk estimates (see Figure S-1). The NRC formaldehyde committee provided several suggestions regarding this part of the IRIS process, including establishing clear guidelines for study selection, describing and justifying assumptions and models used to determine appropriate points of departure, explaining risk-estimation modeling processes that are used to develop unit-risk estimates, assessing the sensitivity of derived estimates, and adequately documenting the conclusions and estimation of all toxicity values. In response, the preamble provides considerations for selecting studies for deriving toxicity values and describes the process for deriving them. In the draft handbook, EPA has expanded on the study-selection criteria, provided considerations for combining data in dose-response modeling, and discussed data management and quality control for dose-response modeling. More detailed guidance on conducting
dose-response modeling, developing candidate toxicity values, and characterizing confidence and uncertainty in toxicity values has yet to be developed for the draft handbook.
The committee is encouraged by the improvements that EPA has made in the IRIS process for deriving toxicity values, particularly the shift away from choosing one study as the “best” study for deriving a toxicity value and toward deriving and graphically presenting multiple candidate toxicity values. As the program evolves, EPA will need to make the best use of the totality of evidence with increased attention to distinguishing the quality and relevance of studies for assessing human dose-response relationships. That will require EPA to develop clear criteria for judging the relative merits of individual mechanistic, animal, and epidemiologic studies for estimating human dose-response relationships. Although subjective judgment remains an inherent feature of deriving toxicity values, EPA should develop formal methods for combining the results of multiple studies and selecting the final IRIS values with an emphasis on achieving a transparent and replicable process. EPA could also improve documentation of dose-response information by clearly presenting two dose-response values: a central estimate (such as a maximum likelihood estimate or a posterior mean) and a lower-bound estimate for a point of departure from which a final toxicity value is derived.2 Reporting both values provides information on statistical uncertainty, such as sampling variation, and makes available to the risk assessor the full range of information. Finally, EPA should develop guidelines for uncertainty analysis and communication in the context of IRIS to support the consistent and transparent treatment of uncertainties.
The committee commends EPA for the improvements that it has made in the IRIS assessment-development process and expects the revisions when completed to result in a transformation of the IRIS program. To ensure that the IRIS program provides the best assessments possible, the committee identified three broad areas on which EPA should focus attention. First, the assessment methodology will need to be updated in a continuing, strategic fashion, and EPA should develop a plan for doing so. Specifically, the agency will need to consider how methods relevant to all elements of the process will evolve and how such progress can be tracked and incorporated into the IRIS assessment-development approach. Second, EPA staff, the CASTs, and the Chemical Assessment Advisory Committee should be encouraged to identify inefficiencies in the IRIS process, which should then be addressed systematically by the IRIS program leadership. EPA should continue to pursue development of firm stopping rules for key points throughout the process to guard against delay and should consider working with other agencies to avoid duplication of effort. Third, EPA management needs to evaluate the human and technologic resources that are needed to conduct IRIS assessments and support methodologic research and the implementation of new approaches. If sufficient financial and staff resources are not available to EPA, it will not be able to continue to improve the IRIS program and keep pace with scientific advancements.
Overall, the committee finds that substantial improvements in the IRIS process have been made, and it is clear that EPA has embraced and is acting on the recommendations in the NRC formaldehyde report. The NRC formaldehyde committee recognized that its suggested changes would take several years and an extensive effort by EPA staff to implement. Substantial progress, however, has been made in a short time, and the present committee’s recommendations should be seen as building on the progress that EPA has already made.
2The lower bound becomes an upper bound for a cancer slope factor but remains a lower bound for a reference value.