The National Toxicology Program (NTP) based its conclusion in the monograph primarily on the human evidence. It considered the human evidence to be “relatively robust” and evaluated the association between fluoride exposure and neurodevelopmental and cognitive effects in 82 publications. It stratified the studies into two categories: lower risk of bias (20 publications1) and higher risk of bias (62 publications). Although it evaluated all the studies, the confidence in its conclusion is primarily based on the lower risk-of-bias studies; it concluded that the higher risk-of-bias studies did not affect its confidence in its hazard conclusion. This chapter provides the committee’s assessment of NTP’s evaluation of the human evidence in the monograph.
In the monograph, NTP clearly displayed the results of the literature search and screening process in a PRISMA flow diagram, a widely accepted framework for reporting a screening process and the ultimate number of included studies. The committee, however, had some concerns regarding NTP’s literature-search strategy. One of the sources used to identify articles for the systematic review was the Fluoride Action Network (FAN). The committee acknowledges FAN’s efforts in providing several studies that appear to be relevant for the review. However, the process by which FAN identified and selected studies is not clear. FAN identified a number of studies published in Chinese language journals—some of which are not in PubMed or other commonly used databases—and translated them into English. That process might have led to a biased selection of studies and raises the question of whether it is possible that there are a number of other articles in the Chinese literature that FAN did not translate and about which NTP is unaware. NTP should evaluate the potential for any bias that it might have introduced into the literature search process. Possible ways of doing so could include conducting its own searches of the Chinese or other non–English-language literature and conducting subgroup analyses of study quality and results based on the resource used to identify the study (for example, PubMed vs non-PubMed articles). As an initial step in such evaluations, NTP should consider providing empirical information
1 Two of the 20 publications investigated adults, and the other 18 publications investigated children.
on the pathway by which each of the references was identified. That information would also improve understanding of the sources that NTP used for evidence integration and the conclusions drawn in the monograph. The committee emphasizes that its comments regarding FAN are aimed only at evaluating bias; they are not intended to discourage stakeholder input into the systematic-review process, and the committee acknowledges and encourages the important contributions of FAN and other stakeholder organizations in this process.
The unit of analysis in a systematic review is a study, not a report or a publication. The protocol and the monograph do not appear to pay sufficient attention to the independence of multiple publications based on single epidemiologic studies. That is important because NTP enumerates in the monograph and describes in both the text and table summaries specific publications that apparently contribute to the “extent of evidence,” one of the criteria on which hazard characterization is based. In at least some cases, a given study is listed and described in separate publications. For example, two publications by Xiang and colleagues (2003, 2011) regarding intelligence (IQ) and fluoride exposures in China were based on the same population, outcome data, and covariates; they are distinguished only by the exposure metric (serum fluoride concentration), which is reported in the 2011 paper to have been collected at the same time as the urine fluoride concentrations in the 2003 paper. By not making it clear that those multiple publications come from a single study, some studies might be double-counted, and NTP’s characterization of the extent of the evidence might be exaggerated.
Consistency of the Protocol and the Approach
The risk of bias in individual studies was assessed by using criteria provided in the protocol. Three key domains—exposure characterization, outcome assessment, and analysis of potential confounding variables—were emphasized. The committee agrees with the comprehensive approach described but is concerned that there were differences in the approach presented in the protocol compared with that in the monograph. For example, the protocol and the Office of Health Assessment and Translation (OHAT) handbook refer to “Tier 3” studies that are rated as having a high risk of bias in the three key domains (exposure, outcome, and confounding). However, the monograph does not present the studies in tiers but rather categorizes them only as having “higher” or “lower” risk of bias. The approach to assessment of confounding also appears to be somewhat inconsistent. The protocol states that “key covariates” include iodine sufficiency and co-exposure to such neurotoxic compounds as arsenic and lead, and it states that “failure to consider the distribution of the key covariates across the exposure groups will result in a ‘probably high [risk of bias]’ or ‘definitely high [risk of
bias]’” (NTP 2017, p. 9). Those statements in the protocol seem to suggest that the key covariates need to be addressed in every study. However, the monograph states that studies were not required to address every potential confounder but rather only co-exposures or confounders that were considered important for a specific study’s population and outcome (NTP 2019, p. 29). Thus, the monograph seems to suggest that the key covariates do not have to be addressed in every study. An example of how that might be a problem is the evaluation of Bashash et al. (2017). NTP does not appear to have considered iodine sufficiency for that study, and arsenic was considered only superficially. However, the study was still rated as “probably low risk of bias” for confounding. That rating might be consistent with the approach described in the monograph, but it seems inconsistent with the protocol. Overall, the protocol and the monograph should be clear and consistent about whether key covariates need to be addressed in every study. If not, the process for deciding which covariates should be addressed in which studies should be clearly described. A final example of inconsistencies between the protocol and the monograph is related to exposure characterization. According to the protocol, “studies that measure or estimate individual exposures, biomarker levels (such as urinary fluoride), or fluoride intake will generally be assigned probably or definitely low [risk of bias] with regard to exposure assessment” (NTP 2017, p. 9). In the monograph, however, Broadbent et al. (2015, p. 73), which used individual “history of use of 0.5-milligram fluoride tablets…and use of fluoridated toothpaste,” was rated as having a high risk of bias with respect to exposure assessment.
Thoroughness of the Evaluation
NTP developed a reasonable list of the factors most likely to cause confounding in the literature as a whole (NTP 2019, p. 29); in several cases, it provided thoughtful discussions of the likelihood of confounding by some of the factors. For example, NTP identified arsenic as a potential confounder and noted in several cases that studies did not take place in areas known to have high arsenic exposures. The committee, however, identified many cases in which NTP’s evaluations or analyses of confounding were insufficient, difficult to understand, or applied inconsistently from one study to another. As noted above, NTP should explain why some sources of potential confounding are considered to be more important in some studies than in others and to address what is known about the magnitude and direction of association between the potential confounders and both fluoride exposure and neurodevelopment. For example, in its analysis of the Russ et al. (2019) study of dementia in the Health Assessment Workspace Collaborative (HAWC), NTP states that “the main confounder missing for evaluating dementia is smoking status.” However, the relative risks are low in many studies of smoking and dementia, and this suggests that smoking is unlikely to have contributed substantially to the hazard ratios of 2.65 (men) and 2.32
(women) of high fluoride exposure and dementia reported in Russ et al. (2019). There are various methods for addressing the potential magnitude of confounding, and NTP should consider some of them (see, for example, Axelson 1978; Rudolph and Stuart 2018).
The potential for confounding might also depend on the source of fluoride. Specifically, the potential confounders that are important in studies where high exposures are due primarily to naturally occurring fluoride in drinking water might differ from those in studies that involve intentionally fluoridated water. For example, arsenic and fluoride may co-occur in some areas that have naturally occurring fluoride, but the co-occurrence might be less common in areas where fluoride exposures come only from intentionally fluoridated water. Overall, the method for assessing which confounders are likely to be important in which studies should include fluoride source.
Exposure misclassification can bias effect estimates in either direction (Jurek et al. 2005). NTP noted in several cases the possibility of a bias from exposure misclassification but did not discuss its likely magnitude and direction and did not discuss it in the context of whether a study reported an association between fluoride exposure and neurodevelopmental and cognitive effects. In many of the studies of childhood neurodevelopment reviewed by NTP, the researchers apparently assessed exposure by using the same methods for all participants regardless of their outcome status. Given that approach, most errors in exposure assessment would most likely bias results to the null. In studies that found no association, that bias could be the major reason for an absence of an association. In studies that identified an association, however, the potential for that bias would be less important in the context of hazard identification because the association would likely be even stronger if one were able to correct for it (Rothman and Greenland 1998).
A possible example can be seen in the Bashash et al. (2017) study of prenatal exposure. Some women had urine samples from all three trimesters, but most did not; this makes the study susceptible to biases resulting from variable completeness of exposure data among participants. Depending on the pattern of “missingness” in relation to true exposure levels (which are unknown) and IQ, the consequences are not readily predictable. Although the bias is not entirely predictable, one might conclude that the missing data could bias results to the null, not toward the association identified in the study. First, there is no indication in the study or other reason to conclude that the number of samples collected from each woman varied strongly by child IQ. Not having samples from all women in all three trimesters would most likely have a nondifferential effect and bias results to the null. Second, the risks might vary by trimester of exposure, and the most susceptible trimester might have been missed in some women. Having exposure data from only a less susceptible trimester in some women would also likely move the results to the null, not toward the positive findings identified.
Another issue related to exposure misclassification can be seen in Broadbent et al. (2015). Here, drinking-water exposures (and thus the differences in exposure) are fairly low. Causal effects are generally more difficult to identify convincingly in studies in which differences in exposure are small. In analyses of fluoride-toothpaste use, participants who reported “always” were compared with those who reported “sometimes.” However, if fluoride-toothpaste use is actually relatively high in many people in the “sometimes” category (for example, if they used fluoridated toothpaste 80–99% of the time), the contrast in exposure between the “sometimes” and “always” groups would be small and true effects would be difficult to identify. The same would apply to the comparison of “never” with “ever” use of fluoride pills in the study if many of the participants in the “ever” group used the pills only rarely or if they came from the nonfluoridated parts of the city, which seems likely.
The committee notes that the issues discussed above would probably not change NTP’s final risk-of-bias decisions in some cases but might in others. Regardless, failure to address those issues thoroughly and consistently raises the question of whether NTP’s evaluations were sufficient and thus whether its final conclusion is based on a fair, transparent, and complete evaluation of the literature.
In assessing cognition or other neurobehavioral outcomes in human studies, it is imperative to protect examiners from information about exposure that could bias their administration and interpretation of assessments. Many neurobehavioral or cognitive assessments require direct interaction with children and interpretation of their responses to test items, so preconceived assumptions about the effects of a specific exposure can result in a biased interpretation in which children assumed to be members of a high-exposure group are classified as more deficient in the outcome. Many of the cross-sectional and case–control studies reviewed by NTP include children from different areas of residence that have different magnitudes of exposure. In those studies, if outcome assessments are conducted in schools or clinics in specific residential areas rather than in a centralized location, children will be identified as belonging to high- or low-exposure groups simply by presenting at those testing locations. Although several studies reviewed by NTP included information on examiner blinding, at least 10 studies did not specify whether outcome assessors were blind to exposure. NTP assumed blinding because urine or drinking-water samples were used to estimate exposure. That assumption can be unfounded, especially in cases in which participants from high- and low-exposure communities were assessed in local schools and clinics where a general sense of exposure characterization could be supposed by the assessor and result in biased outcomes. Because failure to blind examiners can contribute to a high risk of bias in study results and conclusions, this aspect should be considered more carefully in assessing risk of bias in the human studies.
NTP based its conclusions about the effect of fluoride exposure on cognitive and neurodevelopmental outcomes of children on 18 studies that it determined
had lower risk of bias. The studies used a variety of neurodevelopmental and cognitive outcome measures that specifically assessed cognitive development, IQ, attention-deficit hyperactivity disorder (ADHD), visuospatial organization, and memory. Nine of the studies used some form of Raven’s Matrices that assesses inductive reasoning by using visual problem-solving tasks. Raven’s Matrices do not require verbal responses, so they are often considered the best alternative to standardized intelligence tests based on the English language for assessing cognition in studies of non-English speakers. Use of Raven’s Matrices does not increase the risk of bias but assesses a narrow aspect of cognition; it is not equivalent to a full intelligence-test battery that assesses a broad array of cognitive domains. Three of the 18 studies that NTP classified as having low risk of bias used traditional English-based standardized intelligence tests and were accurately classified as having low risk of bias on the basis of the outcome criterion.
In some cases, NTP classified studies as having low risk of bias when the measure of the neurodevelopmental and cognitive outcome was seriously flawed. Given that the outcome determines whether fluoride is hazardous, its proper measurement should be given more weight. One specific example is the study by Barberio (2017) in which the neurodevelopmental outcome is based on parent- or child-reported diagnosis of learning disability or ADHD. That outcome measure is highly problematic because it does not include an objective measure of neurodevelopment or cognition or a confirmation of diagnosis based on review of medical records or objective professional diagnosis. Although NTP recognized that study weakness and judged the outcome as having “probably high risk of bias,” the poor quality of the outcome measure warrants a determination of definitely high risk of bias. Furthermore, that weakness should increase the overall risk of bias for the study.
Overall, because of the weaknesses in the tests used in many studies, the committee finds that NTP’s assertion (NTP 2019, p. 49) that “it is unlikely that evaluation of additional neurodevelopmental effects would change the hazard conclusion” requires further justification.
The committee is concerned that the studies included in the systematic review did not undergo a rigorous statistical review. When asked about the role of statisticians in the review process during the committee’s public meeting, NTP stated that statisticians were consulted only when the research team was not familiar with the analytic methods used in a study. The committee finds that approach insufficient inasmuch as some of the studies identified as having low risk of bias did not adequately account for the hierarchical structure of their data, and this compromised their internal validity. For example, Ding et al. (2011) sampled children in four elementary schools in China and measured exposure by using urine samples from the children. As demonstrated in Table 1 of that study, water fluoride concentrations differed widely among the communities, and so urine fluoride concentrations were likely highly correlated within the communities. The
study authors, however, failed to account for those relationships, which could have resulted in overly precise interval estimates of the exposure effects and inflated type I error in their statistical tests; this is similar to the effect of ignoring cluster-level treatment assignment in a cluster randomized trial (Cornfield 1978). Similarly, Xiang et al. (2003, 2011) appeared to ignore relationships in exposure between persons from the same village. Unlike the Ding et al. (2011) study, however, proper control for clustering was not possible because there were only two villages. Thus, without control for village effects and given the large differences in fluoride concentrations and IQ between villages, the apparent dose–response relationship could be due to a village effect rather than a fluoride effect. As another example, Green et al. (2019) accounted for community-level effects by adjusting for city in their analysis, but it was unclear how this was done. If they treated city as a random effect, their analytic methods were appropriate. However, if they treated city as a fixed effect, their exposure-effect estimates might be biased. When exposure levels are determined at the group (such as city) level, fixed-effect models do not properly separate exposure effects from group effects, and this results in biased estimates and inflated type I errors (Zucker 1990). Although Green et al. (2019) used individual-level exposure rather than city-level exposure, the fixed-effect model could still produce biased estimates if the exposure levels within a city are highly correlated; this might be expected given that some cities were fully on fluoridated water and others were not. Those analytic issues could have been identified by NTP if statisticians had played a more active role in the development of risk-of-bias instructions or its assessment.
The committee also identified errors in summary statistics that negated the internal validity of some studies that were rated as having low risk of bias. For example, Valdez Jimenez et al. (2017) had multiple errors and internal validity issues among its small cohort of 65 participants. Specifically, there was a large difference in numbers of males and females in the offspring (20 males, 45 females), and apparently incorrect probabilities were reported for age differences between participants and nonparticipants, high rates of cesarean deliveries and premature births among participants (degree of overlap not reported), and incorrect comparisons of observed prematurity rates with national expected rates.
NTP states that all 13 studies of childhood IQ that NTP rated as having a low risk of bias identified at least some evidence of an association of fluoride exposure with neurodevelopmental and cognitive effects (NTP 2019, p. 35). Presented in that way, the numbers suggest a remarkable level of consistency. However, the consistency might be exaggerated if only positive results were selected from studies that reported both positive and negative results (see, for example, Bashash et al. 2017 or Green et al. 2019). At the very least, NTP should acknowledge this issue and provide more context when describing the numbers of positive and negative studies. Alternatively, NTP could develop a series of algorithms a priori that could be used to abstract fully comparable results from the
studies and could then consider the pattern generated by juxtaposing like findings. That approach would avoid selective reporting inasmuch as all studies that generated comparable results would be included. The analysis could be followed by a consideration of the magnitude and consistency of evidence of an association. An example of this type of algorithm can be found in the supplementary material (Web Figure 1) of Carlos-Wallace et al. (2016).
Greater clarity is needed on how the final confidence rating was determined; in some cases, it is not clear whether NTP followed its own procedures. For example, in the monograph (NTP 2019, p. 13) and the protocol (NTP 2017, p. 15), NTP mentions the potential for increasing its confidence in the body of evidence if some criteria, including dose–response relationships and consistency, are present. On the basis of information provided in Figures D1-11 and in several descriptions throughout the monograph, it appears that dose–response patterns were seen in several studies (for example, Xiang et al. 2003; Das and Mondal 2016; Saxena et al. 2012). Furthermore, in a number of sections throughout the monograph, NTP notes the consistency of the evidence. For example, the monograph notes that “all lower risk-of-bias studies in children reported that higher fluoride exposure is associated with at least one measure of decreased IQ or other cognitive effect” (NTP 2019, p. 29). Later in the monograph, NTP states that “the human body of evidence provides a consistent pattern of findings that higher fluoride exposure is associated with decreased IQ in children” (NTP 2019, p. 52). However, despite those statements regarding consistency and the presence of dose–response relationships, NTP does not appear to have increased its confidence ratings for any category of studies (NTP 2019, Table 7). NTP should explain why the confidence ratings did not change for any of the study categories.
NTP’s conclusion is based, at least partially, on several cross-sectional studies. Such studies are often criticized because of their potential for reverse causality and exposure misclassification. Reverse causality could involve study subjects in some studies and should be fully evaluated by NTP. However, the committee does not find that reverse causality is likely to be a major concern in most of the cross-sectional studies of fluoride and neurodevelopment identified by NTP because it seems unlikely that diminished neurodevelopmental status would be a widespread and strong determinant of high fluoride exposure in children. Exposure misclassification because of migration in and out of high-fluoride areas could be a concern in some cross-sectional studies but would likely (albeit perhaps not in all cases) bias results toward the null, not toward the positive associations identified
in many studies. In addition, as noted by NTP, several cross-sectional studies minimized such misclassification by including only long-term residents or children who had been living in the same area since birth (see, for example, Xiang et al. 2003, 2011). Overall, the committee felt that well-conducted cross-sectional studies can potentially provide valid and useful information for evaluating the effects of fluoride on neurodevelopment and thus agrees with NTP that these studies should not necessarily be given a final rating of “lower confidence.” As an aside, the committee did not agree with NTP’s use of the term functionally prospective to describe some cross-sectional studies. NTP did not define or explain that term, and it is not used in the epidemiology literature; therefore, the committee discourages its use.
NTP seems to be relying on the results of the funnel plot from the meta-analysis of Choi et al. (2012) for its analysis of publication bias. Although the lack of asymmetry in the plot provides some evidence against major publication bias, NTP should acknowledge the weaknesses of the approach; for example, factors other than publication bias can affect the symmetry of funnel plots, and funnel plots rely on subjective interpretation. In addition, the monograph includes a number of studies that were not included in the Choi et al. (2012) meta-analysis. Thus, NTP should do its own analyses of publication bias and use the analyses to evaluate the likelihood that publication bias could have had major effects on the body of evidence that it has identified.
Definitions of Consistency and Positive
A key conclusion of the monograph is that the results of the epidemiologic studies consistently show a positive association. Although the desire to provide a simple summary of a complex array of evidence is understandable, such claims imply that the studies provide an array of clearly comparable results and that all suggest an adverse effect of fluoride on neurodevelopment or cognition. In fact, many of the studies provide results that are based on multiple indicators of fluoride exposure, assess multiple measures of cognition and neurodevelopment at different ages, use multiple statistical approaches to characterize the relationship between fluoride exposure and health outcomes, and address markedly different magnitudes of fluoride exposure. The committee recognizes that drawing conclusions always requires aggregating or summarizing data that have some degree of heterogeneity, but the data should be examined as subsets along one or more of the axes suggested above. For example, what do the studies of urinary fluoride resulting from naturally occurring fluoride exposure indicate for IQ below the age of 5 years? Accordingly, the monograph should juxtapose results of broadly comparable studies and use the resulting information to provide a text summary of the
patterns observed. If comparing “like to like” results yielded consistent results for all measures, ages, exposure sources, statistical approaches, and exposure ranges—taking random error into account—that would indeed warrant a statement that results consistently show adverse effects. The monograph, however, does not provide the evidence in a manner that leads to that conclusion.
The text that is used to justify the assertions of consistently positive results is purely anecdotal and cites isolated findings from specific studies without explaining why those findings, and not others, were highlighted. Selective reporting of the literature in that way is almost certain to generate a false impression of consistency. Although it might be true that every study has at least some indication of adverse outcomes associated with higher fluoride exposure, that does not provide a clear or necessarily useful assessment of the body of evidence. Furthermore, it is inappropriate to rely on statistical significance as the single indicator of whether a study is called “positive,” given that studies with low power can nonetheless generate an indication of a positive association and that those with isolated statistically significant findings might not provide an overall pattern indicative of a positive association. The information provided in the monograph does not allow readers to follow the steps from assembling and presenting available data in an objective and informative manner to making observations about the pattern of the results to drawing conclusions from the patterns that were observed.
Methodical Presentation of Results
A full understanding of the data calls for their detailed examination in relation to the methodologic features of the studies. Study results can be arrayed in multiple ways that would be informative. For example, studies can be categorized on the basis of such risk-of-bias criteria as blinding or such factors as the major source of fluoride in each study (naturally occurring vs intentional addition), magnitudes of exposure, or the ages at which an exposure or outcome was assessed. Informative evaluations can then be made by comparing study results within the categories. For example, if the methodologically strongest studies tend to show clearer associations than the methodologically weaker ones, the evidence could be interpreted as providing greater support for a possible adverse effect than if the reverse were found. Consistency among studies of varied methodologic quality might also help to provide evidence that some issues do not present major concerns. For example, if studies in which the researchers were blinded yield results similar to those of a comparable set of studies in which researchers were not blinded, this finding might provide evidence that failure to blind was not a major source of bias. Similarly, categorizing studies on the basis of exposure might help to identify dose–response relationships; that is, if a true association exists, studies that have the highest exposures and the widest range between the “low” and “high” exposure groups would be expected to report greater effect sizes than studies that involve low exposures and a narrow exposure range between groups. Overall, by categorizing studies on the basis of a variety of methodologic factors
and comparing groups of studies within the different categories, NTP should be able to provide a more detailed and comprehensive evaluation of the literature.
The committee notes that the presentation of study results in Table 6 and Figures D1-12 made it difficult to assess the variability of exposure–effect estimates across studies. It recommends that NTP present forest plots—similar to Figure 2 in Choi et al. (2012)—for subgroups of studies that have the same effect estimate (such as relative risk), that have similar dose and outcome measurements, and that adjust for the same set of confounders. The committee also notes that the numbers of studies in particular categories, such as those grouped by study design, are inconsistently stated throughout the monograph; these inconsistencies should be corrected.
Finally, the committee agrees with NTP’s decision to base its conclusions primarily on studies that have a lower risk of bias given the previous discussion regarding NTP’s risk-of-bias evaluations. However, its focus on the lower risk-of-bias studies of childhood neurodevelopment outcomes and what seem to be the highly consistent findings across all these studies might give the impression that NTP has artificially increased the confidence in its conclusion regarding this outcome. Stratifying the higher risk-of-bias studies (NTP 2019, Figure A3-1) vs lower risk-of-bias studies (NTP 2019, Figure A3-3) into separate figures might be one source of that concern. NTP should consider creating one figure that includes the risk-of-bias ratings for all the studies with a stratification that separates the categorized higher risk-of-bias vs lower risk-of-bias studies. That approach would represent better how it considered the body of literature by assessing all studies but focusing its conclusions on the lower risk-of-bias studies.
Rationale for Not Performing a Meta-Analysis
The committee strongly recommends that NTP reconsider its decision not to perform a meta-analysis and, if it still decides not to do a meta-analysis, that it provide a more thorough and convincing justification for its decision. In the monograph, NTP states that a meta-analysis was not performed because of “heterogeneity in dose among the available human evidence, and because a hazard conclusion could be reached without conducting a meta-analysis” (NTP 2019, p. 13). A properly conducted meta-analysis can account for heterogeneity in exposure measurements and other aspects of study design, so it is not clear why heterogeneity was listed as a reason for not performing one. It would be difficult to perform one meta-analysis that includes both relative risk estimates and mean differences (or standardized mean differences), but these could be separated out into two meta-analyses. Potentially, meta-analyses of studies deemed sufficiently similar in their exposure and outcome metrics could also be performed and could address NTP’s concern about heterogeneity. However, because NTP did not present the studies in a way that would suggest such groupings, the committee is unclear how feasible such analyses would be. The committee also recommends that NTP explain why it did not update the Choi et al. (2012) meta-analysis. NTP uses the funnel plot in Choi et al. (2012) as evidence of minimal publication bias in its
systematic review. However, Choi et al. (2012) considered only a subset of the studies included in the systematic review, so NTP’s claim of minimal publication bias would be strengthened by adding recent papers to the meta-analysis and constructing a new funnel plot.
Communication Regarding Lower Exposures
The discussion section of the monograph provides an informal assessment of the evidence with regard to exposure range and declares that the positive results are based largely on exposures greater than those used for fluoridation. The basis of that inference is not apparent, and it seems to contradict the earlier assertion that nearly all the studies are positive, including ones that assessed lower exposures. More important, as discussed in Chapter 2, this discussion gives a false impression that NTP conducted a formal dose–response assessment. NTP should make it clear that the monograph cannot be used to assess what concentrations of fluoride are safe.
The monograph “concludes that fluoride is presumed to be a cognitive neurodevelopmental hazard to humans. This conclusion is based on a consistent pattern of findings in human studies across several different populations showing that higher fluoride exposure is associated with decreased IQ or other cognitive impairments in children” (NTP 2019, p. 59). The committee was tasked with assessing whether NTP satisfactorily supports its conclusion. In light of the issues raised by the committee regarding the analysis of various aspects of some studies and the analysis, summary, and presentation of the data in the monograph, the committee finds that NTP has not adequately supported its conclusion. The committee’s finding does not mean that the conclusion is incorrect; rather, further analysis or reanalysis as suggested in the present report is needed to support the conclusion in the monograph.
Axelson, O. 1978. Aspects on confounding in occupational health epidemiology. Scand. J. Work Environ. Health 4:85-89.
Barberio, A.M., C. Quinonez, F.S. Hosein, and L. McLaren. 2017. Fluoride exposure and reported learning disability among Canadian children: Implications for community water fluoridation. Can. J. Public Health 108:229-239.
Bashash, M., D. Thomas, H. Hu, E.A. Martinez-Mier, B.N. Sanchez, N. Basu, K.E. Peterson, A.S. Ettinger, R. Wright, Z. Zhang, Y. Liu, L. Schnaas, A. Mercado-Garcia, M.M. Tellez-Rojo, and M. Hernandez-Avila. 2017. Prenatal fluoride exposure and cognitive outcomes in children at 4 and 6-12 years of age in Mexico. Environ. Health Perspect. 125(9):1-12.
Broadbent, J.M., W.M. Thomson, T.E. Moffitt, and R. Poulton. 2015. Community water fluoridation and intelligence response. Am. J. Public Health 105:3-4.
Carlos-Wallace, F.M., L. Zhang, M.T. Smith, G. Rader, and C. Steinmaus. 2016. Parental, In Utero, and Early-Life Exposure to Benzene and the Risk of Childhood Leukemia: A Meta-Analysis. Am. J. Epidemiol. 183(1):1-14.
Choi, A.L., G. Sun, Y. Zhang, and P. Grandjean. 2012. Developmental fluoride neurotoxicity: A systematic review and meta-analysis. Environ. Health Perspect. 120:1362-1368.
Cornfield, J. 1978. Randomization by group: a formal analysis. Am. J. Epidemiol. 108(2): 100–102.
Das, K., and N.K. Mondal. 2016. Dental fluorosis and urinary fluoride concentration as a reflection of fluoride exposure and its impact on IQ level and BMI of children of Laxmisagar, Simlapal Block of Bankura District, W.B., India. Environ. Monit. Assess. 188:218.
Ding Y., H. Sun, H. Han, W. Wang, X. Ji, X. Liu, and D. Sun. 2011. The relationships between low levels of urine fluoride on children’s intelligence, dental fluorosis in endemic fluorosis areas in Hulunbuir, Inner Mongolia, China. J. Hazard Mater. 186:1942-1946.
Green R., B. Lanphear, R. Hornung, D. Flora, E.A. Martinez-Mier, R. Neufeld, P. Ayotte, G. Muckle, and C. Till. 2019. Association between maternal fluoride exposure during pregnancy and IQ scores in offspring in Canada. JAMA Pediatr. E1-E9.
Jurek, A.M., S. Greenland, G. Maldonado, and T.R. Church. 2005. Proper interpretation of non-differential misclassification effects: expectations vs observations. International Journal of Epidemiology 34:680–687.
NTP (National Toxicology Program). 2017. Protocol for Systematic Review of Effects of Fluoride Exposure on Neurodevelopment. Office of Health Assessment and Translation, Division of the National Toxicology Program, National Institute of Environmental Health Sciences.
NTP. 2019. Draft NTP Monograph on the Systematic Review of Fluoride Exposure and Neurodevelopmental and Cognitive Health Effects. Office of Health Assessment and Translation, Division of the NTP, National Institute of Environmental Health Sciences, National Institutes of Health, US Department of Health and Human Services.
Rothman, K.J., and S. Greenland. 1998. Modern Epidemiology. 2nd edn. LippincottRaven, U.S.A.
Rudolph, K.E., and E. A. Stuart. 2018. Using sensitivity analyses for unobserved confounding to address covariate measurement error in propensity score methods. Am. J. Epidemiol. 187(3):604-13.
Russ, T.C., L.O.J. Killin, J. Hannah, G.D. Batty, I.J. Deary, and J.M. Starr. 2019. Aluminum and fluoride in drinking water in relation to later dementia risk. Brit. J. Psychol. 14:1-6.
Saxena, S., A. Sahay, and P. Goel. 2012. Effect of fluoride exposure on the intelligence of school children in Madhya Pradesh, India. J. Neurosci. Rural Pract. 3:144-149.
Valdez Jimenez, L., O.D. Lopez Guzman, M. Cervantes Flores, R. Costilla-Salazar, J. Calderon Hernandez, Y. Alcaraz Contreras, and D.O. Rocha-Amador. 2017. In utero exposure to fluoride and cognitive development delay in infants. Neurotox. 59:65-70.
Xiang, Q., Y. Liang, L. Chen, C. Wang, B. Chen, X. Chen, and M. Zhou. 2003. Effect of fluoride in drinking water on children’s intelligence. Fluoride 36:84-94.
Xiang, Q., Y. Liang, B. Chen, and L. Chen L. 2011. Analysis of children’s serum fluoride levels in relation to intelligence scores in a high and low fluoride water village in China. Fluoride 44:191-194.
Zucker, D. M. 1990. An analysis of variance pitfall: The fixed effects analysis in a nested design. Educational and Psychological Measurement. 50(4):731-738.