To safeguard public health, the US Environmental Protection Agency (EPA) must keep abreast of new scientific information and emerging technologies so that it can apply them to regulatory decision making. For decades the agency has dealt with questions about what animal-testing data to use to make predictions about human health hazards, how to perform dose-response extrapolations, how to identify and protect susceptible subpopulations, and how to address uncertainties. As alternatives to traditional toxicity testing have emerged, the agency has been faced with additional questions about how to incorporate data from such tests into its chemical assessments and whether such tests can replace some traditional testing methods. In addition, evidence has emerged suggesting that some chemicals have effects at doses lower than those used in traditional toxicity testing, raising concerns that traditional toxicity-testing protocols might be inadequate to identify all potential hazards to human health. In particular, endocrine active chemicals (EACs), or endocrine disruptors, have been a focal point for these questions because they have the ability to modulate normal hormone function, and small alterations in hormone concentrations, particularly during sensitive life stages, can have lasting and significant effects.
To address concerns about potential human health effects from EACs at low doses, EPA requested that the National Academies of Sciences, Engineering, and Medicine develop a strategy to evaluate the evidence for such low-dose effects (see Box S-1). The National Academies convened an ad hoc committee of experts to address this task. The task specified that the committee should perform systematic reviews of animal and human studies on at least two chemicals and demonstrate how the results can be integrated and considered with other relevant data to draw conclusions about causal associations. This report describes the strategy developed by the committee and highlights the role systematic review methods play in this overall strategy. The result of the systematic reviews and the lessons learned in performing them are also presented in the report.
STRATEGY FOR EVALUATING LOW-DOSE EFFECTS
The committee developed a generic strategy for evaluating evidence of low-dose1 effects that includes three broad phases: surveillance for signals that a chemical may cause a health effect or that a health effect may be missed by traditional toxicity-testing methods, investigation and analysis of the evidence, and acting on the evidence (see Figure S-1). The first two phases involve identifying issues or questions to address, determining the best methods for evaluating them, and then conducting appropriate investigation and analyses to support the type of decision to be made. In its deliberations, the committee considered and demonstrated how these two phases apply to the evaluation of EACs. The last phase of the strategy involves policy and other management decisions that fall outside of the committee’s task.
In the strategy, surveillance refers to a process for detecting signals that raise questions about the potential low-dose toxicity of a particular chemical or about the ability to detect low-dose toxicity more generally. For example, signals might include an indication that an adverse outcome in a human population could be related to an EAC exposure, or evidence that a particular low-dose effect might not be detectable with traditional toxicity testing. To conduct the surveillance necessary to identify such signals, the committee identified three broad categories of data that should be monitored on a regular basis. These include data on specific chemicals, information that could have implications for toxicity-testing methods and best practices for EACs, and information on endocrine-related disease in animals and humans. Such information could be obtained by conducting regular surveys of the scientific literature, gathering input from stakeholders, and collecting information about human exposure, for example, through biomonitoring data, external exposure measurements, and computational models that link external and internal exposure.
1 Low dose is defined in the report as external or internal exposure that falls within the range estimated to occur in humans. Human exposure estimates may be based on environmental or biomarker measurements and/or computational models. If no human exposure data are available, low dose is defined on a case-by-case basis relative to an explicitly defined exposure in a particular context.
Once signals are identified, scoping exercises can be used to prioritize areas for investigation and analysis. Scoping involves using the scientific literature and other information to determine the extent, range, and nature of the information on the topic, to identify data gaps, and to consider what additional analyses are needed. Additional factors that could influence the decision to pursue a topic are the size of the population at risk, the public health significance of the issue, and available resources. The scoping exercise also considers the potential actions that might be taken in response to the signal, which will help determine the level of scientific depth and rigor that might be required to inform any such actions.
When a signal is prioritized for further investigation, the next step is to formulate key questions to frame the issues. Once key questions are identified, it is possible to design an approach to answer those questions using appropriate tools for investigation and analysis.
Investigation and Analysis
The committee outlines four main options for investigation and analysis aimed at understanding the potential human health effects from exposure to EACs at low doses: targeted analysis of existing data, generation of new data or models, systematic review of evidence, and integration of evidence. The approaches used should be selected on a case-by-case basis; in some cases one approach will be sufficient while in other cases several investigative approaches might be needed to adequately answer the questions.
Targeted Analysis of Existing Data
Targeted analysis is a method for analyzing (or reanalyzing) data. It can allow for better comparison of results between studies. For example, when outcomes are measured differently between studies (e.g., as continuous or dichotomous variables), it might be possible to convert the data to allow for comparisons. Statistical and other computational approaches to characterize dose-response relationships may be used to provide evidence of low-dose effects. Qualitative analyses can also be used to make judgments about seemingly discordant data. For example, if effects are seen at different doses in two or more studies, it is useful to evaluate the studies for factors that could explain the differences.
Generation of New Data or Models
Some questions can be addressed only through generation of new data or the development of new methods to analyze data. In cases where new data are needed, the investigation and analysis phase would focus on determining the type of data needed to characterize the human health effects and the best methods for obtaining the data. These efforts could include the experimental studies to fill data gaps or the development of new computational models (e.g., physiologically based pharmacokinetic models to address questions about dosimetry or species differences).
For cases where a rigorous assessment is needed to address a question, a systematic review can help focus the evaluation and maximize transparency in both how the assessment was conducted as well as how it was used to draw conclusions. To help ensure that the evidence is selected and evaluated in an objective and consistent manner, a systematic review requires carefully crafting the research question and planning in advance what methods will be used for screening and analyzing the scientific literature to answer the question.
Integration of Evidence
The fourth type of analysis is integration of available evidence to draw conclusions. Evidence integration largely focuses on hazard identification. Drawing conclusions about hazards related to low-dose
effects of EACs will typically also require evaluating evidence specifically on the nature of the dose-response relationship at low doses. Environmental exposure data, such as biomonitoring data, can be useful in defining what subset of data can be considered relevant to low-dose exposures. Evidence integration to address questions about low-dose effects might also need to consider in vitro and mechanistic evidence, modeled dose-response relationships, and co-exposures.
Data integration can also be used to address broader questions such as whether a “new” end point or “new” exposure or assessment window is relevant to determining low-dose toxicity. For example, some end points have been added to regulatory testing protocols in response to growing evidence that they are indicators of toxicity, and the duration of some tests has been extended to capture effects that might occur later in life. Signals identified during the surveillance phase that have these types of implications about toxicity testing should be evaluated by integrating the available evidence.
Once the investigation and analysis phase has been completed, the next step is to select the type of action (or actions) warranted. As shown in Figure S-1, several options for action could be appropriate, including updating chemical assessments, continuing to monitor for new data, updating toxicity-testing designs and practices, or requiring new data or models to reduce uncertainties. The type of action that EPA takes could be influenced by additional factors, including existing policies and regulations, the size of the population at risk, the public health significance of the human health effects, and available resources. Making recommendations about what actions to take was outside the scope of this committee’s activities.
Findings and Recommendations
To ensure adequate understanding of hazards and to inform its decisions about its regulatory toxicity-testing practices, EPA needs a general strategy for ongoing evaluation of evidence of low-dose effects from exposure to EACs. The committee proposes a strategy involving three phases: surveillance, investigation and analysis, and actions. EPA is already conducting many activities consistent with the proposed strategy, though not necessarily in the specific context of assessing low-dose exposure to EACs.
Recommendation: EPA should develop an active surveillance program focused specifically on low-dose exposures to EACs. This program could include regularly monitoring published research and other information sources, gathering input from stakeholders, and considering human exposure information. It might also involve data collection in collaboration with other agencies and outside parties. The surveillance program should periodically identify, scope, and prioritize potential areas of focus related to low-dose effects, such as particular chemicals and end points. Some approaches will require methods and tool development, such as automated methods for monitoring the literature.
Recommendation: After a topic is selected for further evaluation, the agency should plan its investigation by identifying key questions to be addressed and determining the types of data and analyses needed to answer the questions and to support future agency actions. The specific approaches and tools used to implement the strategy to address issues related to low-dose endocrine effects will need to be considered on a case-by-case basis and should be guided by the questions under study.
The four main options for investigation and analysis include targeted analysis of existing data, generation of new data or models, systematic review, and integration of evidence. Different approaches will be appropriate for different circumstances. The types of analyses used to investigate the questions are not mutually exclusive, and in some cases several approaches might be needed to address the questions adequately. Integration of evidence for low-dose adverse human effects of EACs involves consideration of both hazard identification and dose response.
Recommendation: Human environmental exposure or biomonitoring data should be used, if available, to define what subset of the data should be considered as reflective of low-dose exposure.
The proposed strategy will facilitate a greater emphasis on regular consideration of the adequacy of toxicity testing for assessing low-dose exposures to EACs. However, the agency will also be faced with questions about the amount and quality of evidence needed in order to justify updating test methods, and these questions might be more appropriately addressed through policy decisions.
IMPLEMENTING THE STRATEGY: EXAMPLE REVIEWS
In its charge to the committee, EPA requested that the committee perform systematic reviews of animal and human studies on at least two chemicals and show how the results from the animal and human evidence can be integrated to draw conclusions. Systematic reviews and integration of evidence are two of the four options for further investigating and analyzing topics of interest in phase two of the committee’s proposed strategy. The committee undertook these example reviews to demonstrate how these approaches could be used in a strategy to evaluate low-dose toxicity of EACs and to identify lessons learned that could help EPA employ these approaches successfully. However, systematic reviews and integration of evidence will not be appropriate or required in all circumstances.
To select EACs for its example reviews, the committee conducted several exercises to illustrate how phase one of the strategy (Surveillance) might be performed for a question (“Is there evidence of low-dose adverse human effects that act through an endocrine-mediated pathway?”). These surveillance exercises included garnering stakeholder input through a public workshop, surveying the scientific literature, and collecting information about human exposure. As part of phase one, the committee prioritized candidate chemicals using criteria aimed at addressing the elements set forth in the statement of task. For example, because the committee was tasked with demonstrating how different evidence streams can be integrated, it purposely selected EACs for which there appeared to be an adequate number of animal and human studies to allow for comparisons and integration. The two EACs chosen were phthalates and polybrominated diphenyl ethers (PBDEs). Before undertaking its reviews, the committee refined key questions about the effects of the selected chemicals.
At the start of phase two (Investigation and Analysis), the committee developed protocols to use animal and human studies to answer those questions posed in phase one. Protocols were based on the methods developed by the National Toxicology Program’s Office of Health Assessment and Translation (OHAT) and were peer reviewed before the systematic reviews were undertaken. The protocols identified methods of analysis (e.g., meta-analysis) the committee would use. The systematic review method included a framework for drawing conclusions about the “level of evidence” for a health effect as being inadequate, low, moderate, or high. Level-of-evidence ratings were subsequently used in the evidence integration step to classify the hazard associated with a given chemical as not classifiable, suspected, presumed, or known (see Figure S-2). Mechanistic evidence was also considered in determining the hazard conclusion.
The first set of systematic reviews focused on the question of how phthalates might affect male reproductive-tract development. Phthalates2 are ubiquitous environmental contaminants, and human exposure to them has been well documented. Phthalates are known to affect the androgen hormone system, which plays a critical role in the development of the male reproductive tract. The committee focused its investigation on three end points considered to be indicative of changes in androgen levels—anogenital distance (AGD), fetal testosterone levels, and hypospadias. The committee conducted separate reviews of
2 The phthalates include benzylbutyl phthalate (BzBP), dibutyl phthalate (DBP), diethylhexyl phthalate (DEHP), diethyl phthalate (DEP), diisobutyl phthalate (DIBP), diisodecyl phthalate (DIDP), diisononyl phthalate (DINP), diisooctyl phthalate (DIOP), dimethyl phthalate (DMP), di-n-octyl phthalate (DOP), and dipentyl phthalate (DPP).
the animal and human evidence on phthalates and then integrated the evidence to draw conclusions about potential hazards, low-dose effects, and the adequacy of toxicity-testing methods for evaluating those hazards.
Effects of Phthalates on Male Reproductive-Tract Development
The landscape of data on phthalates is complex. There are varying amounts of data available for different phthalates, and a further complication is that human studies often involve exposure to mixtures of phthalates. Although the committee analyzed the evidence related to multiple phthalates and multiple end points, for the purposes of this summary the committee chose to highlight as an illustrative example its analysis of the association between diethylhexyl phthalate (DEHP) and AGD.3
The committee’s systematic review of the relationship between DEHP and changes in AGD included 19 studies in animals and five studies in humans. Both animal and human studies showed reductions in male AGD after in utero exposure to DEHP. A meta-analysis of the animal studies found consistent evidence of a decrease in AGD in association with DEHP treatment, with a dose-response gradient. A meta-analysis of the human studies similarly found consistent evidence that increased maternal urinary concentrations of DEHP metabolites were associated with decreased AGD in male children. Overall, this evidence of an association between DEHP and decreased AGD supports the conclusion that in utero exposure to DEHP is presumed to be a reproductive hazard to humans.4
The committee considered whether pharmacokinetic or mechanistic data would influence this hazard conclusion. The committee found that mechanistic data from in vitro studies and animal models provide biological plausibility that exposure to DEHP is associated with a reduction in AGD in humans, based on decreased fetal testosterone as an intermediate effect. Moreover, androgen-dependent development of the male reproductive tract and the androgen dependence of AGD appear to be well conserved across mammalian species (including humans). Thus, the committee concluded that the pharmacokinetic and mechanistic data support the hazard conclusion that in utero exposure to DEHP is presumed to be associated with decreased AGD in humans.
Drawing conclusions about dose response is more challenging. It is difficult to directly compare the effects of different levels of DEHP exposure in animals and humans because animal studies typically report administered doses whereas studies in humans rely on the measurement of DEHP metabolites in urine or other body fluids. Some investigators have used pharmacokinetic models to estimate human daily intakes of DEHP based on concentrations of phthalate metabolites in urine; these models have suggested that human intake is markedly lower than the doses used in animal studies. However, differences in internal measures of exposure (concentrations in urine or amniotic fluid) were of a much smaller magnitude. Thus, the issue of phthalate effects on male reproductive-tract development represents an example where current toxicity-testing methods can identify a hazard that is presumed to be of concern to humans, but current methods might not be able to accurately predict exposures at which humans are affected. This finding also provides additional support for EPA’s decision to include AGD measurements in regulatory toxicity testing.
The second set of systematic reviews focused on the question of how PBDEs might affect neurobehavioral function. PBDEs are ubiquitous in the environment, and human exposure to them has been well documented. The committee conducted its own review of available animal studies and updated a recent systematic review of human studies conducted by Lam et al., which was shared with the committee in draft form.5 The review of the human studies evaluated effects on intelligence, attention deficit/hyperactivity disorder (ADHD), and attention-related behavioral conditions. For its review of animal studies, the committee focused on findings related to learning, memory, and attention which the committee considered to have the closest parallels to the intelligence and attention-related outcomes measured in the human studies. The review of animal studies included any type of PBDE, but the review of human studies was restricted to the types of PBDEs most commonly reported in human biological samples: BDE-47, -99, -100, and -153.
3 A full analysis of other phthalates and end points is presented in Chapter 3. The hazard conclusions reached on the other phthalates and other end points were either equivalent to or weaker than the one reached for DEHP and AGD.
4 Committee conclusions concerning DEHP effects on fetal testosterone and hypospadias rested on animal evidence since insufficient human evidence was available to assess whether exposure to DEHP is associated with these outcomes.
5 The review was subsequently updated by the authors and is in press for publication in Environmental Health Perspectives.
Although the committee analyzed the evidence related to multiple brominated diphenyl ethers (BDEs), for the purposes of this summary the committee chose to focus on its analysis of the potential effects of BDE-47 exposure on learning and intelligence.6
The animal data on PBDEs and learning, memory, and attention were diverse and complex, with studies using varying designs, outcomes, and types of PBDEs. Six studies of BDE-47 and learning were available, and five found some indication of an effect on at least one measure of learning. The committee also conducted a meta-analysis by combining data from studies on PBDEs reporting latency in the Morris water maze, a test commonly used in studies of learning. The meta-analysis found consistent evidence of an increase in latency in the last trial of the Morris water maze in PBDE-exposed animals (meaning that the exposed animals took longer to locate the escape platform than nonexposed animals), and this effect was robust to multiple sensitivity analyses. The analysis also showed some evidence of a dose-response gradient, but these findings were inconsistent across studies. Differences among studies with regard to dose response might be due to variations in study design, such as the use of different PBDEs, differences in the duration of exposure, differences in internal dose, and differences in strain and species, or to other factors, such as potency of the congeners or pharmacokinetics.
To assess the human evidence, the committee critically evaluated the methods of a recent systematic review conducted by Lam et al. (in press) using an evaluation tool called ROBIS. Judging that this existing review fulfilled the requirements of a systematic review (it followed the Navigation Guide method for performing systematic reviews, which is similar to the OHAT method) and that there was no evidence of risk of bias in the assessment, the committee used the Lam et al. review as a basis for its own assessment. The authors of the review identified nine studies that measured IQ in relation to developmental PBDE exposure. In a meta-analysis of a subset of the studies, the authors found evidence of an association between PBDE exposure and a decrease in IQ. The committee conducted an updated literature search based on this review; finding no studies with substantively new findings, the committee determined that the conclusions of the Lam et al. systematic review would form a sufficient basis for the committee’s work of integrating the available evidence.
Effects of PBDEs on Neurobehavioral Function
6 A full analysis of other BDEs and end points is presented in Chapter 4. The hazard conclusions reached on the other congeners and end points were either equivalent to or weaker than the one reached for BDE-47 and learning in animals and intelligence in humans.
Reviewing mechanistic data on PBDEs in relation to developmental neurotoxicity, the committee identified some biological plausibility of the associations observed between PBDE exposure during the perinatal period and later neurobehavioral outcomes. However, the complexity and multifactorial nature of how PBDEs might affect neurodevelopmental processes hindered the committee’s attempt to define an adverse outcome pathway.
As with phthalates, it is difficult to directly compare PBDE exposure in animal studies to that occurring in humans because the majority of animal studies report only administered doses whereas human studies rely on the measurement of PBDEs in serum or other body fluids. Estimates of human daily intakes based on measurements of PBDEs in food and dust suggest that human exposure is several orders of magnitude lower than that achieved with benchmark doses estimated from the data or the meta-analysis of the animal studies on PBDEs. Studies of internal doses of BDE-47 also show large disparities in the level of exposure between humans and animals, though these disparities are less pronounced than those suggested by the intake data. The available data on these measures were scant and uncertain, though, limiting the ability to use animal studies to predict exposure levels at which effects occur in humans. Thus, this is another situation in which current toxicity-testing methods can identify a hazard that is presumed to be of concern to humans, but current methods might not be able to accurately predict exposures at which humans are affected.
Findings and Recommendations
The following findings and recommendations stem from lessons learned in the committee’s process of performing the systematic reviews and integrating evidence for the selected EACs. Additional findings and recommendations that are more specific to the selected EACs are provided in Chapters 3 and 4.
Findings Related to Conducting Systematic Reviews
Consistency and Transparency: The committee found that the systematic review process was valuable because it provided a framework for identifying, selecting, and evaluating evidence in a consistent and explicit manner; maximized transparency in how the assessments were performed; and facilitated the clear presentation of the basis for scientific judgments.
Chemical Mixtures: The two examples that the committee selected for its systematic reviews involved chemical classes rather than individual chemicals. In retrospect, this aspect added complexity to the reviews. In its evaluation of the phthalates, the committee evaluated individual phthalates separately, demonstrating how systematic reviews can be performed on single chemicals. In its evaluation of PBDEs, the committee considered different PBDEs both separately and in combination, demonstrating one way systematic reviews can be applied to chemical mixtures.
The Use of Meta-Analyses: The committee found that meta-analyses were valuable in summarizing data from the systematic reviews and in comparing the animal and human evidence in a robust and consistent manner. Meta-analyses can be used to inform confidence ratings for bodies of evidence and to support benchmark dose modeling.
Recommendation: Systematic reviews should include meta-analysis of the animal and human evidence, if appropriate. The results of meta-analyses should be used to examine quantitative relationships between EACs and end points of interest to inform the confidence ratings of the bodies of evidence, and, if possible, to estimate benchmark doses.
Evaluating Risk of Bias: The committee found that information important to evaluating the quality of individual animal studies was often not reported, including whether the study controlled for litter effects, whether animals were randomly allocated to study groups, and whether research personnel were blinded
to the study groups during the outcome assessment. Because a lack of adequate reporting could not be distinguished from failure to adhere to practices that minimize bias, failure to report practices that minimize bias often led to higher risk of bias ratings for individual studies, downgrading the overall level of confidence in the body of evidence. These types of problems could be remedied if journals required better reporting of the methods used in animal studies, especially reporting pertaining to issues that might introduce bias into the research. These requirements could build on reporting standards that have been developed by various organizations to improve transparency (e.g., the ARRIVE guidelines). For example, studies should be required to report whether animals were assigned to study groups using random allocation and whether researchers were blinded to the study groups during outcome assessment.
The Use of an Existing Systematic Review: In the PBDE assessment, the committee found it was time saving to use a recent systematic review of the effect of developmental exposure to PBDEs on IQ and ADHD as a basis for its own assessment.
Recommendation: EPA should develop policies and procedures to allow the agency to use and update existing systematic reviews. It is important that the existing systematic review’s study question directly addresses EPA’s topic of interest and that the methods are critically evaluated before the systematic review is used and updated.
Expertise Required: The committee found that conducting a systematic review and integrating evidence requires a multidisciplinary approach tailored to the specific review question. In particular, it is essential to have expertise in the conduct of meta-analyses and benchmark dose modeling.
Findings Related to Integrating Evidence
The committee found comparing evidence on dose-response relationships between animal and human studies to be challenging and imprecise because animal studies often report external administered doses (usually without measures of internal dose) whereas human studies measure biomarkers of internal dose (with estimates of the external administered dose being uncertain). Toxicology studies that measure internal dose metrics, including metrics that are similar to those used in human biomonitoring, could help address this data gap.
Recommendation: To support animal-to-human extrapolations, pharmacokinetic data should be generated and used to develop pharmacokinetic models that make it possible to infer human internal doses (not just intake) from biomonitoring data and animal internal doses from administered doses.
In the case of PBDEs, integration of human data with animal data was challenging because intelligence and attention measures in humans do not have directly corresponding measurements in rodent models. Furthermore, the animal studies used different tests of learning and memory and, even when the same type of behavioral test was used, testing methods and data analysis often differed between studies. The committee found it helpful to focus its quantitative analysis on a specific measure of learning that was most consistently reported in the animal studies.
Pharmacokinetic and mechanistic data provided biological plausibility that the effects observed in animal studies may reflect similar hazards in humans. The committee found that mechanistic data were useful during the scoping and problem formulation phase of planning the systematic review to help determine what outcomes to focus on, as well as to determine how the animal and human evidence could be integrated.
The phthalate and the PBDE evaluations are both cases in which current toxicity-testing paradigms identify a hazard that is presumed to be of concern to humans but might not accurately predict exposures at which humans are affected. The development of pharmacokinetic data and models for extrapolation of data from animal studies or human biomonitoring data could facilitate the evaluation of an EAC’s potential to cause health effects in humans at low doses.