Epidemiology is the study of health and disease in populations. Standard definitions of epidemiology emphasize a descriptive component that captures patterns of disease by person, place, and time and an etiological component that identifies causes of disease (Gordis 2013). The descriptive element of epidemiology comprises tracking of health and disease indicators and population risk factors (surveillance). The etiological activities—searching for the causes and determinants of disease—involve primarily case-control and cohort studies. The span of epidemiological research also includes intervention studies, both randomized and nonrandomized in the assignment of preventive measures, such as vaccinations, or other interventions.
This chapter addresses the evolving approaches used by epidemiologists to investigate the associations between environmental factors and human disease and the role of epidemiology in the context of the committee’s charge regarding 21st century science related to risk-based decision-making. It does not give an overall introduction to the science of epidemiology; such material is readily available in textbooks and elsewhere. It briefly discusses, however, the role of epidemiology in risk assessment, the evolution of epidemiology, data opportunities now available, and types of biases to consider given the use of Tox21 and ES21 tools and methods. The chapter then focuses on the use of -omics technologies in epidemiology and concludes with some challenges and recommendations.
The role of epidemiological evidence has long been established within the risk-assessment paradigm originally described in the report Risk Assessment in the Federal Government: Managing the Process (NRC 1983) and in various later reports (Samet et al. 1998). Identification of risk factors for disease and inference of causal associations from epidemiological studies provide important information for the hazard-identification component. Evidence on hazard obtained from epidemiological studies is given precedence in evidence-evaluation guidelines, including those of the US Environmental Protection Agency and the International Agency for Research on Cancer (IARC). Convincing epidemiological evidence that indicates a hazard is considered sufficient to establish causation, for example, in the IARC carcinogen classification scheme. However, human data are available on only a relatively small number of agents, particularly in comparison with the large number of environmental agents to which people are potentially exposed. In the absence of natural experiments, observational epidemiological studies are the only scientific approach available and ethically acceptable for studying possible effects of potentially harmful agents directly in human populations.
In addition to providing evidence for hazard identification, epidemiological studies can provide understanding of the exposure–response relationship. For some agents, the effects of exposure have been investigated primarily in particular groups of workers, such as asbestos workers, at exposure magnitudes typically much higher than those of the general population, and exposure–response relationships are extrapolated downward, introducing uncertainty. If the needed exposure data on a general population are available, epidemiological studies can provide key information on risk at exposure concentrations relevant to the population at large. For example, air-pollution exposures of participants in large cohort studies, including the American Cancer Society’s Cancer Prevention Study 2 and the multiple studies involved in the European Study of Cohorts for Air Pollution Effects (ESCAPE 2014), have been estimated. Although some exposure misclassification is inherent in the case of most environmental and occupational exposures, there are numerous examples of successful incorporation of epidemiologically based exposure–response relationships into risk assessments: ionizing radiation and cancer, particulate-matter air pollution and mortality, arsenic exposure and cancer, and childhood lead exposure and neuropsychological development. Methods of addressing or correcting for measurement error have been developed; such corrections generally lead to exposure-response curves with steeper slopes (Hart et al. 2015).
Epidemiological studies can also contribute to understanding the exposure–response relationship by identifying determinants of susceptibility if information on characteristics of study participants (such as their age, sex, and now genomes) is available. Data collected for epidemiological research or for population surveillance can be useful for describing exposure distributions on the basis of questionnaires, monitoring, models, and analyses of biological specimens.
Epidemiological research might also provide information on overall population risk that fits into the risk-characterization component of risk assessment. The population attributable risk statistic, originally developed to estimate the burden of lung cancer caused by smoking, provides an estimate of the burden of disease resulting from a causal factor (Levin 1953). Thus, data on human populations can contribute to all four components of the risk-assessment paradigm described in Chapter 1.
The Evolution of Epidemiology
The methods of epidemiological research have not been static. Initially, epidemiological research on the etiology of noncommunicable diseases—primarily cancer, cardiovascular diseases, pulmonary diseases, and metabolic diseases—focused on particular risk factors; exposure assessment was accomplished largely by using self-report questionnaires, measurement and estimation methods in the case of occupational studies, and relatively crude indicators in the case of environmental exposures. Some studies incorporated measurements from biological samples, such as lead or cadmium concentrations, and some estimated exposures with models that used extensive data. For example, in the study of survivors of the Hiroshima and Nagasaki atomic bombings, radiation dose was estimated with an elaborate algorithm that incorporated such information as location and body position at the time of the blast. Epidemiological studies of noncommunicable disease, carried out beginning in the 1950s, focused on risk factors at the individual level; some later studies began to incorporate risk determinants at higher levels of social or organizational structure, including the family, the places of residence and work, and the state and country. Efforts were made to build the studies around conceptual frameworks that reflected understanding of structural, sociological, and cultural factors driving health status and disease risk, and recent decades have seen increasing emphasis on life-course approaches that acknowledge the importance of early life exposures, even in utero and transgenerational, for disease risk. Furthermore, many later studies of the environment and health have been designed to reflect the variation in environmental exposures among and within communities.
Most recently, epidemiological research has been greatly affected by advances in other fields. The start of the 21st century was characterized by rapid advances in technology, medical sciences, biology, and genetics pertinent to epidemiology (Hiatt et al. 2013). Enhanced computing and data-storage capacity have been critical. The advent of genomics and genome-wide association studies (GWASs), for example, has played an important role in promoting the transformation of the practice of epidemiology.
The need to achieve samples large enough to provide studies that have adequate statistical power and the need to replicate novel findings in independent study populations facilitated the evolution of large epidemiological research teams, multicenter studies and consortia, meta-analytical tool development, and data-sharing etiquette. Recent decades have seen an evolution from single investigative teams that have proprietary control of individual datasets and specimens to the establishment of research consortia that have adopted a team-based science and a reproducibility culture through greater sharing of data, protocols, and analytical approaches (Guttmacher et al. 2009; Tenopir et al. 2011). Indeed, some funding agencies have sought to catalyze the transformation further by supporting the development and dissemination of validated state-of-the-science protocols designed to ascertain a broad array of phenotypic measures so that individual research teams (when designing new studies) might be positioned better to share and harmonize data among multiple studies (PhenX Toolkit NHGRI).
Case-control and cohort studies—the traditional workhorses of epidemiology—will continue to make strong contributions. Case-control studies, in particular, will continue to contribute to timely in-depth examination of people that have specific rare outcomes, such as rare cancers or reproductive outcomes, including specific birth defects. Cohort studies will continue to play an important role in aiding in the delineation of early antecedents of disease and the identification of preclinical biomarkers and risk factors and contribute to the foundation for translational research and precision medicine. Cohort studies, if started early enough, can be informative on the importance of early life exposures and their influence throughout the life course. The committee anticipates an increasing number of cohort studies that integrate treatment and health-outcome information from multiple sources, including information from health-care delivery systems. Studies that incorporate analysis of samples from companion biobanks will become key resources for connecting mechanisms identified in -omics and other assessments to pathogenesis in humans. Availability of more extensive geographical location information would allow incorporation of new and emerging data streams that document physical and social environments of populations on small scales into existing and new studies.
In summary, the factors reshaping the field of epidemiology in the 21st century include expansion of the interdisciplinary nature of the discipline; the increasing complexity of scientific inquiry that involves multilevel analyses and consideration of disease etiology and progression throughout the life course; emergence of new sources and technologies for data generation, such as new medical and environmental data sources and -omics technologies; advances in exposure characterization; and increasing demands to integrate new knowledge from basic, clinical, and population sciences (Lam et al. 2013). There is also a movement to register past and present datasets so that on particular issues data can be identified and combined. There are already models for data aggregation across studies (for example, National Cancer Institute Cohort Consortium and Agricultural Health cohorts), and researchers recognize the need for harmonizing data collection to facilitate future dataset aggregation (PhenX Toolkit NHGRI; Fortier et al. 2010). They are also considering how to create global biobanks (Harris et al. 2012).
New Data Opportunities
Epidemiology has always been a discipline that uses large quantities of information with the goal of identifying risk factors that can be targeted in individuals or populations ultimately to reduce disease morbidity and mortality. Today, modern technologies—including genomic, proteomic, metabolomic, epigenomic, and transcriptomic platforms and sophisticated sensor and modeling techniques—facilitate the generation and collection of new types of data. The data can be used to generate hypotheses, but they can also be used to supplement data from legacy studies to strengthen their findings (see Box 4-1). New data opportunities have arisen from changes in how medicine is practiced, how health care is delivered, and how systems store and monitor health-care data (AACR 2015). Biobanks are being constructed by a variety of institutions that provide clinical care and potentially constitute new data sources.1 They typically include collections of biological specimens (blood, urine, and surgical and biopsy specimens), clinical patient information that provides demographic and lifestyle information, perhaps a questionnaire on lifestyle and environmental and occupational exposures, and ascertainment of health outcomes from clinical records. Thus, human data and biosamples potentially available for application of various -omics and other technologies might come from opportunistic studies that rely on data sources that might have been collected and stored for nonresearch purposes. However, evidence from studies that use human tissue and medical data gained through convenience sampling from special populations might not be readily generalized. Furthermore, such studies carry the same potential for bias as other nonexperimental research data, but there is no opportunity with these studies to address some biases via a well-thought out study design, data collection, and protocols for obtaining biospecimens. Thus, new data streams and technologies, although promising, raise important methodological concerns and challenges and are driving the need to develop new study designs and analytical methods to account for technology-specific peculiarities (Khoury et al. 2013). Investigators have cautioned about the increasing possibility of false leads and dead ends with each new assay and have called for careful evaluation of analytical performance, reproducibility, concept
1 The committee notes that biobanks are not a new creation. For example, the National Health and Nutrition Examination Survey, which is conducted for surveillance purposes, collects and analyzes specimens, and the data generated have proved invaluable for exposure assessment. Many other population-based biobanks have been created, usually by enrolling healthy subjects; the largest ones include the European Prospective Investigation into Cancer and Nutrition (IARC 2016) and the UK Biobank (2016).
The tsunami of data spanning the spectrum of genomic, molecular, clinical, epidemiological, environmental, and digital information is already a reality of 21st century epidemiology (Khoury et al. 2013). There are challenges in using current methods to process, analyze, and interpret the data systematically and efficiently or to find relevant signals in potential oceans of noise. To address those issues, the US government in 2012 announced the “Big Data” Initiative and committed funds to support research in data science in multiple agencies (Mervis 2012). Epidemiologists are poised to play a central role in shaping the directions and investment in building infrastructures for the storage and robust analysis of massive and complex datasets. Given experience with multidisciplinary teams, epidemiologists are also equipped to direct the interpretation of the data in collaboration with experts in clinical and basic health sciences, biomedical informatics, computational biology, mathematics and biostatistics, and exposure sciences. Adaptation of technological advances, such as cloud computing, and strategic formation of new academic–industry partnerships to facilitate the integration of state-of-the-art computing into biomedical research and health care (Pechette 2012) are only some of the initial challenges that must be confronted before new data opportunities can be properly and effectively integrated into future epidemiological studies.
Types of Biases and Challenges Related to External Validity
As noted, contemporary epidemiology is faced with an unprecedented proliferation of clinical and health-care administrative data, -omics data, and social and environmental data. The biases that generally affect epidemiological evidence can be grouped into three broad categories: information bias that arises from error in measurements of exposure or outcome variables and co-variates, selection bias that arises from the ways in which participants are chosen to take part in epidemiological studies, and confounding that arises from the mingled effects of exposures of interest and other exposures. External validity refers to the generalizability of findings and is a key consideration in risk assessment. Understanding the selection processes, measurement accuracy, and interpretation of analyses is critical for using epidemiological data in risk assessment, including the new and perhaps large cohorts that will be created from health-care databases and combined with exposure estimates.
The multiplicity, diversity, and size of data sources have generated widespread enthusiasm in researchers about the new possibilities (Roger et al. 2015a,b). There will, however, be some challenges in using the data. For example, reliance on electronic medical records as a sole basis for assembling cohorts might accentuate sample-selection biases because of health-care–seeking behaviors of patients; promote misclassification or incomplete documentation of phenotypes, clinical diagnoses, and procedures because of vagaries in clinical coding incentives and practices; and lead to confounding because key factors needed to evaluate confounding are not routinely collected in medical records, particularly those associated with environmental exposures. Although electronic record systems might support the generation of large cohorts for investigations, having a large sample size does not mitigate the potential for biases, and it increases the likelihood of statistically significant false-positive findings. Furthermore, electronic medical records typically contain little information on occupational and environmental exposures, linkage to exposure databases might be problematic, and information on important potential confounders, such as tobacco use, might be sparse and not collected in the standardized fashion needed for research.
In evaluating risks posed by environmental agents, epidemiologists and exposure scientists typically work together to enhance exposure estimates used in epidemiological studies by broadening the variety of exposures considered, increasing precision of exposure measures, and providing insights into errors that inevitably affect exposure estimates. The full array of advances in exposure science that are described in the ES21 report (NRC 2012) and in Chapter 2 of the present report have application in epidemiological studies. When exposure methods are appropriately incorporated into the study design, they facilitate exploration of measurement error in exposure variables and covariates. Such error has long been considered a serious limitation of epidemiological evidence in risk-assessment contexts; nonrandom errors can bias apparent effects upward or downward, and random error generally obscures associations and dose–response relationships. Measurement-error corrections can be made by using data from validation studies and statistical models that have been developed over the last 2 decades and applied, for example, to studies on diet and disease risk, radiation and cancer, and air pollution and health (Li et al. 2006; Freedman et al. 2015; Hart et al. 2015).
Historically, epidemiological research has incorporated emerging technologies into new and current studies. The need to incorporate new science, however, accelerated several decades ago with the introduction of the paradigm of molecular epidemiology. The new paradigm emerged as a replacement of “black box” epidemiology, an approach that examined associations of risk factors with disease while not addressing the intervening mechanisms. The molecular-epidemiology paradigm opens the black boxes through the incorporation of biomarkers of
exposure, susceptibility, and disease. It stresses the importance of pathways and their perturbation, which is highly relevant to the opportunities provided by 21st century science and specifically -omics technologies. The approach also strengthens the evidence base for one of Bradford Hill’s guidelines for causality: understanding of biological plausibility (see Chapter 7). For example, carcinogenesis is thought to be a multifactorial process in which mutations and selective microenvironments play critical roles, and key steps of the process can be explored with biomarkers. The molecular-epidemiology paradigm is a general one and conceptually accommodates emerging methods for generating biomarker data.
As indicated, molecular-epidemiology research is focused on underlying biology (exposure and disease pathogenesis) rather than on empirical observation. Thus, as -omics technologies have emerged, they have been integrated into current studies and have affected study design, particularly specimen collection and management. The incorporation of -omics approaches dates back about 2 decades, beginning with the genomic revolution. In some of the current cohort studies, blood samples that had been appropriately stored were analyzed for single-nucleotide polymorphisms (SNPs) and other markers to search for genes associated with disease risk, including those modifying risk associated with environmental agents.
The utility of bringing -omics technologies into epidemiological research is already clear as exemplified by many studies that have incorporated genomics. One well-known starting point for exploring the genetic basis of disease has been GWAS, which involves the comparison of genomic markers in people who have and people who do not have a disease or condition of interest. The list of -omics approaches applied in epidemiological research has now expanded beyond genomics to include epigenomics, proteomics, transcriptomics, and metabolomics (see Box 1-1). Table 4-1 lists advantages and disadvantages of their use. Examples of their use in a specific context are provided in Appendix B, which describes the meaning and limitations of -omic approaches in the context of epidemiological research on air pollution. Although the new methods have the potential to bring new insights from epidemiological research, there are many challenges in applying them. Some new studies are being designed with the intent of prospectively storing samples that can be used for existing and future -omics technologies, for example, in the case of the EU-funded projects Helix and EXPOsOMICS described in Chapter 1. Obtaining data from human population studies that are parallel to data that can be obtained from in vitro and in vivo toxicity assessments is already possible and offers the possibility of harmonizing comparisons of exposure and dose.
In principle, the -omics approaches now support nontargeted explorations of genes with genomics, mRNA with transcriptomics, proteins with proteomics, and metabolites with metabolomics. With the exception of genomics, the measurements usually reflect changes within cells at one or a few points in time only, and the tissues that are used in humans are primarily surrogates, such as blood, urine, and saliva. Combining different -omics tools, however, increases the possibility for a better understanding of how different external exposures interact with internal molecules, for example, by inducing mutations (genomics), causing epigenetic changes (epigenom-
|Advantages||Use in large, hypothesis-free investigations of the whole complement of relevant biological molecules.|
|Better understanding of phenotype–genotype relations.|
|Might provide insights into the effects of interactions between environmental conditions and genotypes and mechanistic insights into disease aetiology.|
|Limitations||There are limitations arising from cost of assays, quality of biological material available (such as instability of RNAs), and the amount of labor needed.|
|Techniques that are still in their discovery state and new leads need to be carefully investigated and compared with existing biological information from in vivo and in vitro tests.|
|New leads in the discovery of novel intermediate markers need to be confirmed in other independent studies preferably with different platforms.|
|Moving from promising techniques to successful application of biomarkers in occupational and environmental medicine requires not only standardizing and validating techniques, but also appropriate study designs and sophisticated statistical analyses for interpreting study results especially for untargeted approaches (the issue of multiple comparisons and false positives).|
Source: Adapted from Vineis et al. 2009.
ics), or modifying the internal cell environment in more complex ways. The latter changes might be monitored with proteomics, transcriptomics, or metabolomics.
One informative strategy for the integration of -omics technologies into epidemiological research is the meet-in-the-middle approach (Vineis et al. 2013). The approach provides insights into biological plausibility that can bolster causal inference. In the context of a population study, the approach generally involves a prospective search for intermediate biomarkers that are linked to the underlying disease and are increased in those who eventually develop disease, and a retrospective search that links the intermediate biomarkers to past exposures of the environmental agent of concern. As illustrated in Figure 4-1, the approach can be considered as three steps: an investigation into the association between exposure and disease, an assessment of the relationship between exposure and biomarkers of exposure and early effects, and an assessment of the relationship between the disease outcome and intermediate biomarkers. Inference of a causal relationship between exposure and disease is strengthened if associations are documented for each of the three key relationships in Figure 4-1, corresponding to A, B, and C.
A recent study of epigenetics and lung cancer (Fasanelli et al. 2015) is illustrative. The biomarkers are methylation status of the AHRR gene and the F2RL gene, which are hypomethylated in smokers (exposure in Figure 4-1) (Vineis et al. 2013; Guida et al. 2015). Hypomethylation of the genes is also associated with lung cancer (disease in Figure 4-1C). The question is, Are those biomarkers on the causal pathway for lung cancer caused by smoking? Fasanelli et al. (2015) showed by using the statistical technique of mediation analysis that 37% of lung cancers could be explained by the methylation status of the two genes. Thus, the two genes are biomarkers that are likely to be on the causal pathway and illustrate the “meeting in the middle” of the exposure and the disease, the middle being the biomarker. The committee notes, however, that fully assessing causality requires additional steps beyond statistical analysis.
Exposome-Wide Association Studies
As defined in Chapter 1, exposome refers to the totality of exposures from conception to death. Some have questioned whether the exposome as defined defies practical measurement and is therefore not amenable to scientific methods (Miller and Jones 2014). In an attempt to define the exposome as a measurable entity, Rappaport and Smith (2010) proposed to consider first the body’s internal chemical environment and how the body responds to these chemical exposures.2 They referred to the exposures as the internal exposome and distinguished it from the external exposome—exposures external to the body—and suggested that the internal and external exposomes are complementary. For example, internal assessment might identify environmental health associations (that is, generate new hypotheses on disease etiology), but external exposure assessments are needed to identify sources, consider exposure routes, and address spatial and
2 The inclusion of biological response in the concept helps to expand beyond external chemical exposures to many types of exposures—including psychological or physical stress, infections, and gut flora—that produce endogenous chemicals, such as oxidative molecules, and disease-producing responses, such as inflammation, oxidative stress, and lipid peroxidation.
temporal variability of exposures (Turner et al. in press). Consequently, an external-exposome assessment can take place after hypotheses have been generated, and the environmental sources of internal changes can be sought. The two study designs—one that looks for internal changes starting from external measurements (external-exposome assessment) and one that looks for external sources on the basis of internal signals (internal-exposome assessment)—are complementary and have been defined as “bottom-up” and “top-down” approaches, respectively.
The -omics tools that can be used to capture the internal exposome make nontargeted analyses that parallel GWASs in concept and approach possible. Studies of that design have been referred to as exposome-wide association studies (EWASs).3 Specifically, the EWAS approach involves the investigation of associations of a large number of small molecules, proteins, or lipids with disease or intermediate phenotypes to identify biomarkers of exposure or disease. One general EWAS approach to generate new hypotheses on disease causation has been described by Rappaport and Smith (2010). Figure 4-2 shows a study design that can lead to the generation of new hypotheses about chemical hazards in the context of a case–control study. Targeted and nontargeted metabolomics approaches are used to compare exposures of cases that have a specific disease with exposures of ones that do not (controls). After the initial discovery phase, the experimental design can be improved by a testing (replication) phase with a prospective context (a case-control study that is nested in a prospective cohort). That approach takes temporality into account by using biological samples collected before disease manifestation to avoid or to reduce the potential for reverse causation. Unidentified features that are significantly associated with the outcomes of interest would next be chemically identified by using methods described in Chapter 2, for example, by using NMR, IMS-MS/MS, or cheminformatics or by synthesizing and evaluating chemical standards for candidate chemicals. In the next step, validation of the association and a final causal assessment would be attempted through replication in more than one cohort, and biological plausibility would be evaluated.
Biological plausibility could be evaluated with a targeted analysis of available human tissues by using proteomics, metabolomics, or other methods to search for biological responses related to the disease. Alternatively, novel animal models or high-throughput in vitro assays described in Chapter 3 could be used to test candidate chemicals and generate biological-response data that could be compared with responses related to the EWAS-identified association with disease. Evaluation of biologi-
3 The committee notes that the acronym EWAS was originally proposed by Patel et al. (2010) to refer to environment-wide association studies, but others, such as Rappaport (2012), have used EWAS to refer more specifically to exposome-wide association studies, as used here by the committee.
cal plausibility would ideally also include refinement of exposure, if necessary, and a systematic comparison of human exposures to exposures in test systems that are used to produce the supporting biological-response data. If similar toxicity data and models are used, responses to exposures in cohort members could be directly compared with those in test systems; the comparison would provide additional evidence on the likelihood of biological plausibility, which would be greater if responses to exposure were similar, and smaller if they were not. An example of the approach described was used to investigate colon cancer. The research began with three cross-sectional case–control studies and found an association between an unidentified metabolomic feature (analyte) and colon cancer (Ritchie et al. 2013). The association was later confirmed prospectively in the European Prospective Investigation into Cancer and Nutrition cohort, and the metabolic feature was identified as belonging to a group of ultra–long-chain fatty acids (Perttula et al. 2016).
The EWAS approach offers exciting opportunities, but there are challenges that need to be addressed. The challenges in using tools that produce “big data” are similar to those encountered in all multiexposure studies. The study design and analysis have to be chosen carefully and assessed in terms of all classic biases to establish causality, that is, using principles that apply to targeted designs that focus on a single exposure and outcome. The EWAS approach adds the challenge of determining which exposures among many correlated ones have a causal role and which reflect a biological perturbation caused by other agents. The temporal dynamics of the exposures need to be addressed with the stability of media concentrations. An additional premise of the EWAS approach is that useful, biologically informative biomarkers can be identified, that is, that the chemicals in question are not too short-lived and exposure not too sporadic to be captured by only one or a few biospecimens obtained in a cross-sectional survey or cohort study.
The committee notes that use of retrospective case–control design for EWAS makes it impossible to be certain if associations observed reflect a causal relationship between exposures and the outcome investigated or if the associations are a consequence of the disease or its treatment. As summarized by Thomas et al. (2012), the technique of Mendelian randomization (Davey Smith et al. 2004) is one way to address reverse causation and uncontrolled confounding; a gene is used as an instrumental variable (Greenland 2000) to evaluate the causal effect of a biomarker on disease risk. In an approach that parallels the meet-in-the-middle approach, a novel two-step extension of this idea has been proposed for methylation studies that uses two genes as instrumental variables: one estimates the exposure–methylation association, and the other the methylation–disease association (Cortessis et al. 2012; Relton and Davey Smith 2012). There is an inherent assumption in that approach that the instrumental variable is indeed an appropriate instrument for exposure.
New Analytical Challenges
There are formidable challenges in integrating the -omics technologies and data into epidemiological research, and robust high-dimensional analytical techniques will be required to integrate and analyze all the data. For example, statistical analyses that consider many exposure variables simultaneously without strong priors, such as in EWASs, greatly increase the risk of observing random associations (false positives) because of multiple testing. Therefore, statistical tools for the analysis of multiple exposures have motivated investigators to draw on important lessons learned from the analysis of GWAS data (Shi and Weinberg 2011; Thomas et al. 2012); some are described below. In general, statistical techniques for high-dimensional data—such as those noted and others, including machine learning, dimension reduction, and variable-selection techniques—must be adapted to the longitudinal-data-accrual context to account for such issues as time-varying exposure and delayed effects (Buck Louis and Sundaram 2012).
Multistep analytical approaches have been used to estimate health risks associated with different types or combinations of exposures. For example, estimates from EWAS analytical approaches with no a priori information might be quantified by using classical regression models while controlling for false discovery rate, as is done in GWASs (Patel et al. 2010, 2013; Vrijheid et al. 2014). Furthermore, flexible and smoothing modeling techniques (Slama and Werwatz 2005) might be used to identify and characterize possible thresholds or exposure–response relationships.
Pathway analytical approaches are increasingly used for integrating and interpreting high-dimensional data generated by multiple -omics techniques; these approaches have enabled analyses of relationships between multiple exposures and multiple health outcomes. It is noteworthy that pathway analytical approaches have been used to identify molecular signatures associated with environmental agents through exploratory analyses of metabolites, proteins, transcripts, and DNA methylation in biological samples (Jennen et al. 2011; Vrijheid et al. 2014). As summarized by Vrijheid et al. (2014), once biomarkers have been identified, available libraries of biological pathways—such as Gene Ontology (Ashburner et al. 2000), Kyoto Encyclopedia of Genes and Genomes (Kanehisa and Goto 2000), Reactome (Fabregat et al. 2016), and Comparative Toxicogenomics Database (Davis et al. 2015)—can be searched and used to identify relevant biological pathways affected by exposures whether alone or in combination. Furthermore, biological pathways can be grouped and described using available soft-
ware, such as Ingenuity Pathway Analysis (Krämer et al. 2014), Cytoscape (Saito et al. 2012), and Impala (Kamburov et al. 2011). For example, those analytical approaches have been applied to several types of -omics data from systems that respond to 2,3,7,8-tetrachlorodibenzo-p-dioxin and to a broader set of environmental and pharmacological agents (Jennen et al. 2011; Kamburov et al. 2011).
Other methods are also available to address the new analytical challenges. First, analysis of covariance techniques has been used to integrate individual exposures (obtained, for example, from personal wearable devices) and outdoor exposures (obtained, for example, from environmental monitoring) by exploring the variance components of key exposures arising from multiple sources before creating exposure groups or clusters. Second, factor analysis and latent class analysis have proved useful for creating reduced sets of exposure indexes on the basis of commonly occurring exposures while allowing people who share similar exposure profiles to be grouped. Third, to address the high-dimensional nature of epigenetic data, cluster-analysis techniques developed by Siegmund et al. (2006) can be applied to exposome-wide association-genomic studies; these techniques treat the cluster rather than individual epigenetic marks as a latent risk factor for disease (Cortessis et al. 2012). Fourth, structural equation modeling approaches might be used to define combined exposure variables on the basis of knowledge summarized by directed acyclic graphs (Budtz-Jørgensen et al. 2010).
Bayesian profile regression models might be used to identify groups of people who have a similar exposome but show marked differences in the health-outcome variable of interest (Molitor et al. 2010; Papathomas et al. 2011; Vrijheid et al. 2014). Model-based clustering would be applied to the exposure data while allowing the outcome of interest to influence cluster membership. The Bayesian model–based clustering technique has been used, for example, to identify a cluster in a high-risk set for lung cancer—a group who has the characteristics of living near a main road, having high exposure to PM10 (particulate matter with aerodynamic diameter ≤10 μm) and to nitrogen dioxide, and carrying out manual work (Papathomas et al. 2011; Vrijheid et al. 2014).
The general need for caution in contending with the potential for false-positive associations that arise from analysis of large datasets is generally recognized among those handling such data. In addition to analytical approaches, such as correcting p values for multiplicity and using such parameters as the false-discovery rate, the committee notes that epidemiological findings are interpreted holistically in the context of other relevant evidence. In the context of risk assessment, hazard identification would rarely, if ever, be based on an association found in a single epidemiological study, absent additional evidence.
With the emergence of Tox21 and ES21 approaches, the committee anticipates new connections between biomarkers and human health outcomes. Epidemiological studies have an implicit role in providing the population counterpart that is needed to interpret biomarkers measured in laboratory studies through the general paradigm of molecular epidemiology and the meet-in-the-middle approach. For that purpose, epidemiologists need to generate human data (1) to harmonize doses used in in vitro high-throughput assays with those associated with the exposures experienced in the population setting, (2) to explore the relevance of pathways identified in assay systems to human responses to the same agents and validate the predictive value of pathways detected in vitro assays for the occurrence of human disease, (3) to develop and validate models of human susceptibility, and (4) to compare and corroborate exposure–response relationships obtained from in vitro assays and in human populations.
The overall goal of gaining new insights by connecting -omics data generated in laboratory with data gathered in population contexts will not be achieved without consideration of the needed research infrastructure and the logistical barriers to bringing together datasets from disparate sources. The committee concludes by highlighting some challenges that face epidemiological research and recommendations for addressing them. The committee notes that several recommendations below call for developing or expanding databases. In all cases, data curation and quality evaluation should be routine in database development and maintenance.
Developing the Infrastructure and Methods Needed to Advance the Science
Challenge: When used in epidemiological studies, particularly ones with large biobank cohorts that might reach a million or more participants, -omics assays can generate large databases that need to be managed and curated in ways that will facilitate access and analysis. There is an additional challenge of analyzing extremely large datasets by using a hypothesis-driven or exploratory approach.
Recommendation: Resources should be devoted to accelerating development of database management systems that will accommodate extremely large datasets, support analyses for multiple purposes, and foster data-sharing and development of powerful and robust statistical techniques for analyzing associations of health outcomes with -omics data and exploring such complex problems as gene–environment interactions. Such efforts are already under way in a number of fields, such as clinical research
that involves health-care data, and should be extended to epidemiological research.
Challenge: Standard methods are needed to describe the data that have been generated and that are shared among disciplines. The problem has been recognized in genomics and has led to the development of annotated gene ontologies, and similar approaches could be extended to other types of -omics data.
Recommendation: Ontologies should be developed and expanded so that data can be harmonized among investigative groups, internationally, and among -omic platforms. Such ontologies generally do not incorporate data collected by epidemiologists. Such tools as STROBE should be expanded and adapted to the new generation of epidemiological studies; STROBE has already been expanded to encompass molecular epidemiology (Gallo et al. 2011). The Framework Programme 7 EU Initiative—coordination of standards in metabolomics (COSMOS)—is developing “a robust data infrastructure and exchange standards for metabolomics data and other metadata” (Salek et al. 2015); this type of approach should be extended to other -omics data.
Challenge: Data-sharing involves many complexities, particularly when the data are from human studies. However, data-sharing could be particularly beneficial if data could be accessed in a way that would support uniform analyses and integration through hierarchical analyses or meta-analysis. Data-sharing could also lead to more powerful assessments of hazard and of exposure–response relationships. One useful example is the pooling of data from studies of radon-exposed underground miners that supported the development of risk models for indoor radon (Lubin et al. 1995).
The same issues surrounding data-sharing arise in other domains in which big-data approaches are emerging, and a general culture of data-sharing will be needed. Regarding genomics, posting of sequencing data has become the norm but with attention to anonymity. Similar sharing will ideally extend to other -omics data and lead to the development of a culture of data-sharing, pragmatic solutions to the inherent ethical problems, and standardized ontologies and databases. The committee notes that discussion around data-sharing is moving rapidly with regard to clinical trials; similar efforts around observational data are needed (Mascalzoni 2015).
Recommendation: Steps should be taken to ensure sharing of observational data relevant to risk assessment so that, for example, biomarkers can be validated among populations. As noted above, to achieve that goal, standard ontologies should be developed and used for capturing and coding key variables. There is also a need for systematic exploration of possible logistical and ethical barriers to sharing potentially massive datasets drawn from human populations.
Collaborating and Training the Next Generation of Scientists
Challenge: New research models based on biobanks and large cohorts derived from clinical populations will become a valuable resource for applying -omics and other biomarker assays, but there are intrinsic limitations related to biases and the scope of data available in electronic records. There are also complicated issues related to access to private and confidential medical records and to sharing of such data.
Recommendation: As biobanks and patient-based cohorts are developed, those developing them should engage with epidemiologists and exposure scientists on the collection of exposure data to ensure that the best and most comprehensive data possible are collected in this context. Finding ways to capture exposure information will be particularly challenging and will likely require ancillary data collection in nested studies.
Challenge: A wide array of biospecimens is being collected and stored on the assumption that they will be useful in the future for a variety of purposes, including assays that cannot be anticipated. Storage methods and consent procedures need to support future use.
Recommendation: Epidemiologists should anticipate future uses of biospecimens that are collected in the course of epidemiological research or other venues, such as screening or surveillance, and ensure that the array of specimens and their handling and storage will support multiple assays in the future. Such future-looking collections should be a design consideration, and input should be obtained from scientists who are developing new assays.
Challenge: A new generation of researchers who can conduct large-scale population studies and integrate -omics and other emerging technologies into population studies is needed. The next generation also needs sufficient multidisciplinary training to be able to interact with exposure and data scientists.
Recommendation: The training of epidemiologists should be enriched with the addition of more in-depth understanding of the biological mechanisms underlying human diseases and of the biomarker assays used to probe them.
Challenge: The landscape of epidemiological research is changing quickly with a move away from the fixed legacy cohorts of the past, such as the Nurses’ Health Study, to pragmatically developed cohorts that are grounded in new and feasible ways of cohort identification
and follow-up. There are also likely to be large national cohorts, such as the cohort already under development for the Precision Medicine Initiative. Those cohorts are intended as platforms for a wide array of research questions; they are designed as large banks of biospecimens but will have inherent limitations regarding the exposure information available.
Recommendation: Epidemiologists, exposure scientists, and laboratory scientists should collaborate closely to ensure that the full potential of 21st century science is extended to and incorporated into epidemiological research. Multidisciplinarity should be emphasized and sought with increasing intensity. As the new cohorts are developed, the opportunity to ensure that they will be informative on the risks posed by environmental exposures should not be lost.
AARC (American Association for Cancer Research). 2015. AACR Cancer Progress Report 2015 [online]. Available: http://cancerprogressreport.org/2015/Documents/AACR_CPR2015.pdf [accessed July 21, 2016].
Alsheikh-Ali, A.A., W. Qureshi, M.H. Al-Mallah, and J.P. Ioannidis. 2011. Public availability of published research data in high-impact journals. PLoS One 6(9):e24357.
Ashburner, M., C.A. Ball, J.A Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1):25-29.
Buck Louis, G.M., and R. Sundaram. 2012. Exposome: Time for transformative research. Stat. Med. 31(22):2569-2575.
Budtz-Jørgensen, E., F. Debes, P. Weihe, and P. Grandjean. 2010. Structural equation models for meta-analysis in environmental risk assessment. Environmetrics 21(5):510-527.
Cortessis, V.K., D.C. Thomas, A.J. Levine, C.V. Breton, T.M. Mack, K.D. Siegmund, R.W. Haile, and P.W. Laird. 2012. Environmental epigenetics: Prospects for studying epigenetic mediation of exposure–response relationships. Hum. Genet. 131(10):1565-1589.
Davey Smith, G., R. Harbord, and S. Ebrahim. 2004. Fibrinogen, C-reactive protein and coronary heart disease: Does Mendelian randomization suggest the associations are non-causal? QJM 97(3):163-166.
Davis, A.P., C.J. Grondin, K. Lennon-Hopkins, C. SaraceniRichards, D. Sciaky, B.L. King, T.C. Wiegers, and C.J. Mattingly. 2015. The Comparative Toxicogenomics Database’s 10th year anniversary: Update 2015. Nucleic Acids Res. 43(Database issue):D914-D920.
ESCAPE (European Study of Cohorts for Air Pollution Effects). 2014. ESCAPE Project [online]. Available: http://www.escapeproject.eu/index.php [accessed July 21, 2016].
EXPOsOMICS. 2016. About EXPOsOMICS [online]. Available: http://www.exposomicsproject.eu/ [accessed July 21, 2016].
Fabregat, A., K. Sidiropoulos, P. Garapati, M. Gillespie, K. Hausmann, R. Haw, B. Jassal, S. Jupe, F. Korninger, S. McKay, L. Matthews, B. May, M. Milacic, K. Rothfels, V. Shamovsky, M. Webber, J. Weiser, M. Williams, G. Wu, L. Stein, H. Hermjakob, and P. D’Eustachio. 2016. The Reactome pathway knowledgebase. Nucleic Acids Res. 44(D1):D481-D487.
Fasanelli, F., L. Baglietto, E. Ponzi, F. Guida, G. Campanella, M. Johansson, K. Grankvist, M. Johansson, M.B. Assumma, A. Naccarati, M. Chadeau-Hyam, U. Ala, C. Faltus, R. Kaaks, A. Risch, B. De Stavola, A. Hodge, G.G. Giles, M.C. Southey, C.L. Relton, P.C. Haycock, E. Lund, S. Polidoro, T.M. Sandanger, G. Severi, and P. Vineis.2015. Hypomethylation of smoking-related genes is associated with future lung cancer in four prospective cohorts. Nat. Commun. 6:10192.
Fortier, I., P.R. Burton, P.J. Robson, V. Ferretti, J. Little, F. L’Heureux, M. Deschênes, B.M. Knoppers, D. Doiron, J.C. Keers, P. Linksted, J.R. Harris, G. Lachance, C. Boileau, N.L. Pedersen, C.M. Hamilton, K. Hveem, M.J. Borugian, R.P. Gallagher, J. McLaughlin, L. Parker, J.D. Potter, J. Gallacher, R. Kaaks, B. Liu, T. Sprosen, A. Vilain, S.A. Atkinson, A. Rengifo, R. Morton, A. Metspalu, H.E. Wichmann, M. Tremblay, R.L. Chisholm, A. Garcia-Montero, H. Hillege, J.E. Litton, L.J. Palmer, M. Perola, B.H. Wolffenbuttel, L. Peltonen, and T.J. Hudson. 2010. Quality, quantity and harmony: The DataSHaPER approach to integrating data across bioclinical studies. Int. J. Epidemiol. 39(5):1383-1393.
Freedman, L.S., J.M. Commins, J.E. Moler, W. Willett, L.F. Tinker, A.F. Subar, D. Spiegelman, D. Rhodes, N. Potischman, M.L. Neuhouser, A.J. Moshfegh, V. Kipnis, L. Arab, and R.L. Prentice. 2015. Pooled results from 5 validation studies of dietary self-report instruments using recovery biomarkers for potassium and sodium intake. Am. J. Epidemiol. 181(7):473-487.
Gallo, V., M. Egger, V. McCormack, P.B. Farmer, J.P. Ioannidis, M. Kirsch-Volders, G. Matullo, D.H. Phillips, B. Schoket, U. Stromberg, R. Vermeulen, C. Wild, M. Porta, and P. Vineis. 2011. STrengthening the Reporting of OBservational studies in Epidemiology-Molecular Epidemiology (STROBE-ME): An extension of the STROBE statement. PLoS Med. 8(10):e1001117.
Gordis, L. 2013. Epidemiology, 5th Ed. Philadelphia: Elesevier and Saunders. 416 pp.
Greenland, S. 2000. An introduction to instrumental variables for epidemiologists. Int. J. Epidemiol. 29(4):722-729.
Guida, F., T.M. Sandanger, R. Castagné. G. Campanella, S. Polidoro, D. Palli, V. Krogh, R. Tumino, C. Sacerdote, S. Panico, G. Severi, S.A. Kyrtopoulos, P. Georgiadis, R.C. Vermeulen, E. Lund, P. Vineis, and M. Chadeau-Hyam. 2015. Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation. Hum. Mol. Genet. 24(8):2349-2359.
Guttmacher, A.E., E.G. Nabel, and F.S. Collins. 2009. Why data-sharing policies matter. Proc. Natl. Acad. Sci. US 106(40):16894.
Harris, J.R., P. Burton, B.M. Knoppers, K. Lindpaintner, M. Bledsoe, A.J. Brookes, I. Budin-Ljøsne, R. Chisholm, D. Cox, M. Deschênes, I. Fortier, P. Hainaut, R. Hewitt, J. Kaye, J.E. Litton, A. Metspalu, B. Ollier, L.J. Palmer, A. Palotie, M. Pasterk, M. Perola, P.H. Riegman, G.J. van Ommen, M. Yuille, and K. Zatloukal. 2012. Toward a roadmap in global biobanking for health. Eur. J. Hum. Genet. 20(11):1105-1111.
Hart, J.E., X. Liao, B. Hong, R.C. Pruett, J.D. Yanosky, H. Suh, M.A. Kiomourtzoglou, D. Spiegelman, and F. Laden. 2015. The association of long-term exposure to PM 2.5 on all-cause mortality in the Nurses’ Health Study and the impact of measurement-error correction. Environ. Health. 14:38.
Hiatt, R.A., S. Sulsky, M.C. Aldrich, N. Kreiger, and R. Rothenberg. 2013. Promoting innovation and creativity in epidemiology for the 21st century. Ann. Epidemiol. 23(7):452-454.
IARC (International Agency for Research on Cancer). 2016. The European Prospective Investigation into Cancer and Nutrition (EPIC) Study [online]. Available: http://epic.iarc.fr/ [accessed July 21, 2016].
Jennen, D., A. Ruiz-Aracama, C. Magkoufopoulou, A. Peijnenburg, A. Lommen, J. van Delft, and J. Kleinjans. 2011. Integrating transcriptomics and metabonomics to unravel modes-of-action of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) in HepG2 cells. BMC Syst. Biol. 5:139.
Kamburov, A., R. Cavill, T.M. Ebbels, R. Herwig, and H.C. Keun. 2011. Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA. Bioinformatics 27(20):2917-2918.
Kanehisa, M., and S. Goto. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28(1):27-30.
Khoury, M.J., T.K. Lam, J.P. Ioannidis, P. Hartge, M.R. Spitz, J.E. Buring, S.J. Chanock, R.T. Croyle, K.A. Goddard, G.S. Ginsburg, Z. Herceg, R.A. Hiatt, R.N. Hoover, D.J. Hunter, B.S. Kramer, M.S. Lauer, J.A. Meyerhardt, O.I. Olopade, J.R. Palmer, T.A. Sellers, D. Seminara, D.F. Ransohoff, T.R. Rebbeck, G. Tourassi, D.M. Winn, A. Zauber, and S.D. Schully. 2013. Transforming epidemiology for 21st century medicine and public health. Cancer Epidemiol. Biomarkers Prev. 22(4):508-516.
Krämer, A., J. Green, J. Pollard, and S. Tugendreich. 2014. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30(4):523-530.
Lam, T.K., M. Spitz, S.D. Schully, and M.J. Khoury. 2013. “Drivers” of translational cancer epidemiology in the 21st century: Needs and opportunities. Cancer Epidemiol. Biomarkers Prev. 22(2):181-188.
Levin, M.L. 1953. The occurrence of lung cancer in man. Acta. Unio. Int. Contra. Cancrum. 9(3):531-541.
Li, R., E. Weller, D.W. Dockery, L.M. Neas, and D. Spiegelman. 2006. Association of indoor nitrogen dioxide with respiratory symptoms in children: Application of measurement error correction techniques to utilize data from multiple surrogates. J. Expo. Sci. Environ. Epidemiol. 216(4):342-350.
Lubin, J.H., J.D. Boice, Jr., C. Edling, R.W. Hornung, G.R. Howe, E. Kunz, R.A. Kusiak, H.I. Morrison, E.P. Radford, J.M. Samet, M. Tirmarche, A. Woodward, S.X. Yao, and D.A. Pierce. 1995. Lung cancer in radon-exposed miners and estimation of risk from indoor exposure. J. Natl. Cancer Inst. 87(11):817-827.
Mascalzoni, D., E.S. Dove, Y. Rubinstein, H.J.S. Dawkins, A. Kole, P. McCormack, S. Woods, O. Riess, F. Schaefer, H. Lochmüller, B.M. Knoppers, and M. Hansson. 2015. International Charter of principles for sharing bio-specimens and data. Eur. J. Hum. Genet. 23:721-728.
Mervis, J. 2012. US science policy. Agencies rally to tackle big data. Science 336(6077):22.
Miller, G.W., and D.P. Jones. 2014. The nature of nurture: Refining the definition of the exposome. Toxicol. Sci. 137(1):1-2.
Molitor, J., M. Papathomas, M. Jerrett, and S. Richardson. 2010. Bayesian profile regression with an application to the National Survey of Children’s Health. Biostatistics 11(3):484-498.
NRC (National Research Council). 1983. Risk Assessment in the Federal Government: Managing the Process. Washington, DC: National Academy Press.
NRC (National Research Council). 2012. Exposure Science in the 21st Century: A Vision and a Strategy. Washington, DC: The National Academies Press.
Papathomas, M., J. Molitor, S. Richardson, E. Riboli, and P. Vineis. 2011. Examining the joint effect of multiple risk factors using exposure risk profiles: Lung cancer in nonsmokers. Environ. Health Perspect. 119(1):84-91.
Patel, C.J., J. Bhattacharya, and A.J. Butte. 2010. An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus. PLoS One 5(5):e10746.
Patel, C.J., D.H. Rehkopf, J.T. Leppert, W.M. Bortz, M.R. Cullen, G.M. Chertow, and J.P. Ioannidis. 2013. Systematic evaluation of environmental and behavioral factors associated with all-cause mortality in the United States National Health and Nutrition Examination Survey. Int. J. Epidemiol. 42(6):1795-1810.
Pechette, J.M. 2012. Transforming health care through cloud computing. Health Care Law Mon. 5:2-12.
Perttula, K., W.M. Edmands, H. Grigoryan, X. Cai, A.T. Iavarone, M.J. Gunter, A. Naccarati, S. Polidoro, A. Hubbard, P. Vineis, and S. Rappaport. 2016. Evaluating ultra-long chain fatty acids as biomarkers of colorectal cancer risk. Cancer Epidemiol. Biomarkers Prev. 25(8):1216-1223.
Rappaport, S.M. 2012. Biomarkers intersect with the exposome. Biomarkers 17(6):483-489.
Rappaport, S.M., and M.T. Smith. 2010. Environment and disease risks. Science 330(6003):460-461.
Relton, C.L., and G. Davey Smith. 2012. Two-step epigenetic Mendelian randomization: A strategy for establishing the causal role of epigenetic processes in pathways to disease. Int. J. Epidemiol. 41(1):161-176.
Ritchie, S.A., J. Tonita, R. Alvi, D. Lehotay, H. Elshoni, S. Myat, J. McHattie, and D.B. Goodenowe. 2013. Low-serum GTA-446 anti-9 inflammatory fatty acid levels as a new risk factor for colon cancer. Int. J. Cancer. 132(2):355-362.
Roger, V.L., E. Boerwinkle, J.D. Crapo, P.S. Douglas, J.A. Epstein, C.B. Granger, P. Greenland, I. Kohane, and B.M. Psaty. 2015a. Roger et al. respond to “future of population studies.” Am. J. Epidemiol.181(6):372-373.
Roger, V.L., E. Boerwinkle, J.D. Crapo, P.S. Douglas, J.A. Epstein, C.B. Granger, P. Greenland, I. Kohane and B.M. Psaty. 2015b. Strategic transformation of population studies: Recommendations of the working group on epidemiology and population sciences from the National Heart, Lung, and Blood Advisory Council and Board of External Experts. Am. J. Epidemiol. 181(6):363-368.
Saito, R., M.E. Smoot, K. Ono, J. Ruscheinski, P.L. Wang, S. Lotia, A.R. Pico, G.D. Bader, and T. Ideker. 2012. A travel guide to Cytoscape plugins. Nat. Methods 9(11):1069-1076.
Salek, R.M., S. Neumann, D. Schober, J. Hummel, K. Billiau, J. Kopka, E. Correa, T. Reijmers, A. Rosato. L. Tenori, P. Turano, S. Marin, C. Deborde, D. Jacob, D. Rolin, B. Dartigues, P. Conesa, K. Haug, P. Rocca-Serra, S. O’Hagan, J. Hao, M. van Vliet, M. Sysi-Aho, C. Ludwig, J. Bouwman, M. Cascante, T. Ebbels, J.L. Griffin, A. Moing, M. Nikolski, M. Oresic, S.A. Sansone, M.R. Viant, R. Goodacre, U.L. Günther, T. Hankemeier, C. Luchinat, D. Walther, and C. Steinbeck. 2015. COordination of Standards in MetabOlomicS (COSMOS): Facilitating integrated metabolomics data access. Metabolomics 11(6):1587-1597.
Samet, J.M., R. Schnatter, and H. Gibb. 1998. Invited commentary: Epidemiology and risk assessment. Am. J. Epidemiol. 148(10):929-936.
Shi, M., and C.R. Weinberg. 2011. How much are we missing in SNP-by-SNP analyses of genome-wide association studies? Epidemiology 22(6):845-847.
Siegmund, K.D., A.J. Levine, J. Chang, and P.W. Laird. 2006. Modeling exposures for DNA methylation profiles. Cancer Epidemiol. Biomarkers Prev. 15(3):567-572.
Slama, R., and A. Werwatz. 2005. Controlling for continuous confounding factors: Non- and semiparametric approaches. Rev. Epidemiol. Sante Publique 53(Spec. No. 2):2S65-2S80.
Tenopir, C., S. Allard, K. Douglass, A.U. Aydinoglu, L. Wu, E. Read, M. Manoff, and M. Frame. 2011. Data sharing by scientists: Practices and perceptions. PLoS One 6(6):e21101.
Thomas, D.C., J.P. Lewinger, C.E. Murcray, and W.J. Gauderman. 2012. Invited commentary: GE-Whiz! Ratcheting gene-environment studies up to the whole genome and the whole exposome. Am. J. Epidemiol. 175(3):203-207.
Turner, M.C., M. Nieuwenhuijsen, K. Anderson, D. Balshaw, Y. Cui, G. Dunton, J.A. Hoppin, P. Koutrakis, and M. Jerrett. In press. Assessing the exposome with external measures: Commentary on the State of the Science and Research Recommendations. Annual Review of Public Health.
UK Biobank. 2016. Biobank [online]. Available: http://www.ukbiobank.ac.uk/ [accessed July 21, 2016].
Vineis, P., A.E. Khan, J. Vlaanderen, and R. Vermeulen. 2009. The impact of new research technologies on our understanding of environmental causes of disease: The concept of clinical vulnerability. Environ. Health 8:54.
Vineis, P., K. van Veldhoven, M. Chadeau-Hyam, and T.J. Athersuch. 2013. Advancing the application of omicsbased biomarkers in environmental epidemiology. Environ. Mol. Mutagen. 54(7):461-467.
Vrijheid, M., R. Slama, O. Robinson, L. Chatzi, M. Coen, P. van den Hazel, C. Thomsen, J. Wright, T.J. Athersuch, N. Avellana, X. Basagaña, C. Brochot, L. Bucchini, M. Bustamante, A. Carracedo, M. Casas, X. Estivill, L. Fairley, D. van Gent, J.R. Gonzalez, B. Granum, R. Gražulevičienė, K.B. Gutzkow, J. Julvez, H.C. Keun, M. Kogevinas, R.R. McEachan, H.M. Meltzer, E. Sabidó, P.E. Schwarze, V. Siroux, J. Sunyer, E.J. Want, F. Zeman, and M.J. Nieuwenhuijsen. 2014. The human early-life exposome (HELIX): Project rationale and design. Environ. Health Perspect. 122(6):535-544.