Models and test systems for toxicity testing have evolved over past decades. Their strengths and weaknesses have been debated, and most agree that no inherently perfect model could exist (Cunningham 2002). Gradually, however, regulatory agencies in the United States and elsewhere have come to accept data from mathematical models and from assay systems that use mammalian and other experimental organisms, cultured cells, and bacteria for evaluating potential hazards and quantifying risks posed by chemical exposures. Some model systems have become nearly indispensable for risk assessment even though inherent shortcomings and imperfections have been widely acknowledged. Such systems include rodent cancer bioassays, multigeneration tests of reproductive and developmental outcomes in rodents, and bacterial mutagenicity tests. Such tests and resulting data have become commonly accepted for use in human-health assessments and often serve as a benchmark or comparator for new assays and data types that are emerging (Thomas et al. 2012).
Before new assays are used in particular regulatory-decision contexts, such as pesticide registration, their relevance, reliability, and fitness for purpose are established and documented. Such characterization of assays has evolved into elaborate processes that are commonly referred to as validation of alternative methods. Formal mechanisms for validation have been established in the United States, Europe, and many Asian countries. In addition, an international standardization of validation methods is emerging to ensure reciprocity and uniformity of outcomes (Burden et al. 2015). According to the Organisation for Economic Co-operation and Development (OECD), validation is “the process by which the reliability and relevance of a particular approach, method, process or assessment is established for a defined purpose” (OECD 2005). In that context, the term reliability refers to the reproducibility of the method “within and between laboratories over time, when performed using the same protocol.” The term relevance is meant to ensure the scientific underpinning of the test and of the outcome that it is meant to evaluate so that it tests “the effect of interest and whether it is meaningful and useful for a particular purpose.” The Institute of Medicine (IOM 2010) defined the process of validation as “assessing [an] assay and its measurement performance characteristics [and] determining the range of conditions under which the assay will give reproducible and accurate data.”
In plain language, a validation process is used to establish for developers and users of an assay that it is ready and acceptable for its intended use. Although the purpose and principles of validation remain generally constant, the underlying process must evolve to reflect scientific advances. Indeed, the availability of new tests has increased dramatically; many are attractive in cost, time, or use of animals and animal-welfare considerations. The number of chemicals that have been evaluated with new test methods has also increased dramatically (Kavlock et al. 2009; Tice et al. 2013). The reliability of the new tests is of general concern given that existing validation processes cannot match the pace of development of new tests.
The new tests are being developed by scientists in academe, private companies, and government laboratories; sometimes, the utility of a particular marker, assay, or model for decision-making is not immediately recognized by the original developer. Likewise, the resources, time, and effort that are invested in the development can be vastly different and not reflect the ultimate utility of a particular test. Thus, the original developers might not be involved in determining whether a test is fit for purpose for a particular application or provides the degree of certainty that is required to provide information necessary in a particular decision-making context.
In this chapter, the committee describes existing frameworks and efforts for validation of new alternative or nontraditional methods, assays, and models and provides recommendations on the key elements of validation for toxicity testing. The committee emphasizes that validation, although important, is not the only factor involved in achieving regulatory acceptance of new alternative test methods. Furthermore, the committee notes that although assay and model validation for toxicity testing is already an established process, other important disciplines, such
as exposure science, have yet to develop formal criteria and processes for validation, although some have developed approaches to establish best practices.
The Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) was established by the National Institute of Environmental Health Sciences (NIEHS) in 1997 as an ad hoc federal interagency committee to address the growing need for obtaining regulatory acceptance of new toxicity-testing methods (NIEHS 1997). The National Toxicology Program (NTP) Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM) was also established in NIEHS to support ICCVAM in “the development and evaluation of new, revised, and alternative methods to identify potential hazards to human health and the environment with a focus on replacing, reducing, or refining animal use” (Casey 2016). Since 2000, ICCVAM activities have been governed by the ICCVAM Authorization Act (2000), which specifies that 15 agencies of the federal government—including the US Food and Drug Administration, the US Environmental Protection Agency, the Consumer Product Safety Commission, the US Department of Transportation, the Occupational Safety and Health Administration, and the US Department of Agriculture—be represented on ICCVAM.
ICCVAM established the Guidelines for the Nomination and Submission of New, Revised, and Alternative Test Methods (NIEHS 2003) and has successfully evaluated and recommended numerous alternative test methods for regulatory use. Test methods that have been evaluated and recommended for use by NICEATM and ICCVAM are aimed at acute systemic toxicity, dermal corrosivity and irritation, developmental toxicity, endocrine disruption, genetic toxicity, immunotoxicity (allergic contact dermatitis), biologics and nanomaterials, pyrogenicity, and ocular toxicity. The evaluation process includes not only individual test methods but computational and integrated testing strategies (Pirone at al. 2014).
ICCVAM-recommended methods, however, have not always been implemented, and this has caused increasing concern. A potential solution for the near term has been to integrate some activities of NICEATM with those of the federal government’s Tox21 consortium (Birnbaum 2013). Specifically, the revised charge to NICEATM now consists of supporting ICCVAM; providing bioinformatics and computational toxicology support to NTP and NIEHS projects, especially those related to Tox21; conducting and publishing analyses of data from new, revised, and alternative testing approaches; and providing information to test-method developers, regulators, and regulated industries (Casey 2016).
Another highly relevant activity that was conducted under the auspices of IOM was the report of the Committee on the Evaluation of Biomarkers and Surrogate Endpoints in Chronic Disease (IOM 2010). Specifically, that committee recommended a three-part framework for biomarker evaluation consisting of analytical validation (Is the biomarker able to be accurately measured?), qualification (Is the biomarker associated with the clinical end point of concern?), and use (What is the specific context of the proposed use?). Although the primary users of the IOM framework are stakeholders that are concerned with evidence-based decision-making in medicine and public health, the framework has great relevance to the process for validating any new test method (see Box 6-1).
In the European Union, formal activities for validating alternative approaches to animal testing started in 1991 with creation of the European Centre for the Validation of Alternative Methods (ECVAM). Since 2011, ECVAM’s tasks have been subsumed by the European Union Reference Laboratory for Alternatives to Animal Testing (EURL ECVAM), part of the European Commission’s Joint Research Centre. The general aims and approaches of EURL ECVAM are similar to those of ICCVAM and include activities to advance the scientific and regulatory acceptance of nonanimal tests that are important to biomedical sciences through research, test development, and validation and maintaining databases (Gocht and Schwarz 2013) and to co-ordinate at the European level the independent evaluation of the relevance and reliability of tests for specific purposes. The guiding principles of the EURL ECVAM work are based on ECVAM recommendations concerning the practical and logistical aspects of validating alternative test methods in prospective studies (Balls 1995; Hartung et al. 2004; EC 2016a); the recommendations are in internal guidelines and strategy papers, for example, ECVAM Guidance on Good Cell Culture Practice (Coecke et al. 2005), the OECD guidelines (see Box 6-2), and relevant parts of the EU Test Methods Regulation (EC 2008). ECVAM and the European Partnership for Alternative Approaches to Animal Testing (Kinsner-Ovaskainen et al. 2012) have also made conclusions and offered recommendations on the validation of integrated approaches.
At the international level, OECD has been active, especially in the last 5 years, in coordinating the development of formal guidelines for validation of individual tests, alternative methods, and computational models (see Box 6-2). The 1981 Mutual Acceptance of Data Deci-
sion for the Assessment of Chemicals including Pesticides C(81)30(Final) stipulated that “data generated in the testing of chemicals in an OECD Member country in accordance with OECD Test Guidelines and OECD Principles of Good Laboratory Practice shall be accepted in other Member countries for purposes of assessment and other uses relating to the protection of man and the environment.” It created an impetus for establishing a formal international process for validating test methods. A formal process now exists for development and adoption of OECD test guidelines, part being a formal validation, where the nomination usually begins at the national level, proceeds through the expert committees (from the Working Group of National Coordinators of the Test Guidelines Programme to OECD Chemicals and Environmental Policy Committees), and ultimately is approved by the OECD Council.
Opinions of the Broader Scientific Community on Validation
Because of the importance of validating novel toxicity-testing methods and the reality of the rapid proliferation of new tests, many opinions have been voiced in the last decade on how the validation process needs to evolve. Although there are various degrees of formality in the suggested changes, all authors agree that the existing frameworks are not optimal and could be improved. Hartung (2007) argued for a move away from validating by comparison with existing “gold standards,”1 a common testing approach that might not reflect molecular and physiological realities of the human body and argued that tests should be developed to provide more mechanistic information and thus help to establish causality.
Judson and colleagues (Judson et al. 2013) suggested the following general principles: follow current validation practice to the extent possible and practical, increase the use of reference compounds to demonstrate assay reliability and relevance better, de-emphasize the need for cross-laboratory testing, and implement a Web-based, transparent, and expedited peer-review process.
Patlewicz and colleagues (Patlewicz et al. 2013) argued that standard steps of validation practice should still apply and that the validation process for any new test must articulate the scientific and regulatory rationale for the test, the relationship between what the test measures and the resulting biological effect of interest, a detailed protocol for the test, the domain of applicability, criteria for describing the results of the test, known limitations, and standards for determining good performance (positive and negative standards).
Finally, the International Life Sciences Institute Health and Environmental Sciences Institute, an industry-funded nonprofit organization, has recently begun a new project on developing a “Framework for Intelligent Non-Animal
1 A gold standard is defined as a reference standard that is regarded as the best available to determine a particular condition. The gold standard is the benchmark with which a new procedure is compared. Data from clinical trials and epidemiological studies provide the best examples of benchmarks for the potential effects of drugs or chemicals on the human body. In toxicology, there are cases in which the currently used methods are regarded as inadequate to predict human toxicity. In such cases, other validation methods need to be considered.
Methods for Safety Assessment.”2 This activity is pursuing a mission to bring together the collective knowledge of scientists from academe, industry, and government with an eye to the development of criteria to establish confidence in using nonanimal methods to support regulatory decisions and to develop a framework organized around IOM (2010) principles noted above.
The following sections describe what the committee views as the most important aspects of the validation process and challenges associated with them. The committee provides some recommendations for overcoming the challenges and for moving the validation process forward to meet the needs of assessing novel test methods.
Defining the Scope and Purpose of New Assays as an Essential Element in the Process of Validation and Acceptance
Most of the existing guidance deals with the technical aspects of the process for assay validation, but it is equally important to determine whether a new assay or test battery is meant to replace an existing one or is a novel approach that aims to improve decision-making and provide information that is critical but previously unavailable.
Recommendation: A clear definition of the purpose of the new test should be considered before a specific validation process is defined. One must establish the fitness of the test for a particular decision context, select appropriate comparators (for example, a gold standard, mechanistic events, or biomarkers), and delineate the scope of the validation exercise to be commensurate with the proposed use. For example, can a new assay or test battery be used to characterize subchronic or chronic adverse health end points? Test performance characteristics (specificity, sensitivity, and coverage) might need to be adjusted, depending on the decision type and context. Ultimately, it should be clear whether the validation process is aimed at testing reliability, validity, or both.
Enabling Fit-for-Purpose Validation
The challenge of finding an appropriate comparator to enable fit-for-purpose validation of new test methods is considerable because disagreements about the quality of a gold standard or about whether there is one are common. If it is the case of validating a new assay as a replacement for an existing one, one must determine what gold standard is to be used as a comparator. Expert judgment will be needed to determine the validity of an existing method or model to be used as the comparator. If it is the case of validating a novel approach, the decision context for which the information can be used and the availability of other data need to be clearly defined. Statisticians have addressed the question of how to assess the validity of test methods when there is no gold standard (Rutjes et al. 2007). Some of the methods involve correction of imperfect reference standards through the use of additional information or imputed values. Other methods construct a reference standard by using the results of multiple test methods. Each approach has merits for the purpose of replacing animal tests for toxicity.
Two important issues on which there is still no consensus in the scientific community are evaluation of the validity of assays that are not intended as one-to-one replacements for in vivo toxicity assays and assessment of the concordance of data from assays that use cells or proteins of human origin and toxicity data that are virtually all derived from animal models. Judson et al. (2013) have provided ideas on how to validate assays that are intended to be used in a high-throughput context and to be interpreted only in the context of the results of many other assays that evaluate the same biological effect or pathway. Those ideas need to be debated, modified, and tested. As to the concordance issue, it is likely that lack of concordance among species is due not to large differences in the function of highly conserved proteins, such as steroid receptors, but to differences in pharmacokinetics and metabolism. Selected investigation of interspecies concordance at a molecular level will prove or disprove that hypothesis. Data already exist in the literature that will allow comparisons, and the results will support decisions
on what modifications, if any, are needed to accommodate species differences in validation efforts.
Recommendation: Workshops or other mechanisms that can be used to develop consensus opinions among scientific experts on defining appropriate reference standards should be considered. Appropriate disclaimers about author affiliations should be included in any reports or opinions that might result from the activities; conflicts of interest need to be carefully managed.
Establishing the Utility and Domain of New Assays
Another important aspect of validation is establishing the assay utility and clearly defining its domain of applicability,3 its capacity for chemical biotransformation, its ability to establish a concentration–response relationship, its mechanistic relevance, and the applicability of its results. It is necessary to ensure that negative test results are not negative because of the lack of chemical metabolism, insufficient concentration tested, chemical volatility, chemical binding to plastic, or other factors. Determining the validity of negative results is an important challenging issue because the stakeholders inherently weigh positive data more than negative data or vice versa, depending on the decision context. Likewise, understanding the mechanistic relevance of a result of a new assay is important; it should be clear whether the test is assessing an initiating event, a key event, or an adverse outcome.
Recommendation: A description of the utility and domain of the test should be provided to inform the validation process and the ultimate use and interpretation of the data. There should be a clear statement concerning what a positive response or a negative (no) response from the assay means and what controls are appropriate or should be used.
Establishing Performance Standards
Data quality is a key determinant of acceptance of any test method. Assay performance guidelines that include quality-assurance metrics and quality control of day-today operation are well defined (for example, OECD Performance Based Test Guideline TG455), and it is widely recognized that such information needs to be documented. Performance standards4 are critical in a validation context and are a step toward regulatory acceptance, such as development into an OECD test guideline; however, performance standards are not equally well defined for all types of assays. For example, OECD provides performance standards primarily on estrogen-receptor activity and skin irritation, corrosion, and sensitization.5
Recommendation: Performance standards should be developed for all types of assays that evaluate relevant adverse health outcomes with relevance being determined by a particular decision context.
Another important part of testing assay performance is establishing reference-chemical lists. A validation reference-chemical list for a number of end points to guide assay developers should help to mitigate disagreements among stakeholders. Engagement of stakeholders—such as regulatory-agency staff, nongovernment organizations, and industry—in establishing the lists will contribute to acceptance of the data produced by assays that are validated using the lists. Some effort has been invested in addressing this challenge, and some valuable lists have been created (Brown 2002; Eskes et al. 2007; Casati et al. 2009; Pazos et al. 2010; EC 2016b). However, there are few molecular targets for which there is a diverse set of specifically defined reference chemicals that can aid in determining both positive and negative performance of a test.
Recommendation: Common chemical lists that are fit for different purposes and can evolve should be defined and used for validation of assays and models where possible. That will help the scientific community to establish specificity and potential redundancy among new assays.
Validation or testing in multiple laboratories is one common element of current practice; however, it is recognized that ring trials6 take too long and are difficult to accomplish if the assays are proprietary, use ultrahigh throughput, or require specialized equipment or expertise. There might not be enough qualified laboratories in the world to perform the test. In the European Union, a network of vetted laboratories that can conduct validation reliably has been established as one way to address the challenge (European Union Network of Laboratories for the Validation of Alternative Methods). Judson et al. (2013) offered another possible solution and proposed performance-based validation: one validates the performance of a new test against the results of previously validated tests for the same end point (for example, a “gold-
3 The domain of applicability defines what substances can be reliably tested in the assay. For example, can substances that have limited solubility or are volatile be tested using the assay?
4 Performance standards “provide a basis for evaluating the comparability of a proposed test method that is mechanistically and functionally similar. Included are (1) essential test method components; (2) a minimum list of reference chemicals selected from among the chemicals used to demonstrate the acceptable performance of the validated test method; and (3) the comparable levels of accuracy and reliability, based on what was obtained for the validated test method, that the proposed test method should demonstrate when evaluated using the minimum list of reference chemicals” (OECD 2005).
6 In a ring trial, a given assay is tested in established laboratories to determine its reliability.
standard” test that might have undergone the formal OECD-like validation). Yet another alternative is to use a consensus resulting from multiple tests as a benchmark against which each test is evaluated and to assess variation about the consensus by using resampling techniques or meta-analysis (see Chapter 7). However, there is a real challenge in that many protocols that are used by contract research laboratories to conduct guideline tests are proprietary. Patlewicz et al. (2013) emphasize that any new validation approaches need to allow proprietary tests. In one solution for validating proprietary tests, an outside body provides blinded samples to the testing laboratory and then independently evaluates the accuracy of the test.
Recommendation: Government agencies should provide explicit incentives to academic, government, or commercial laboratories to participate in validation.
An alternative (or additional consideration) to technical ring trials is peer review of the methods and of data from new assays. However, more accessible and consistently formatted data are needed for validation through peer review. Data transparency and current agency-specific practices for releasing data to the public pose many challenges. For example, although ToxCast and Tox21 programs have established practices for releasing data in various formats, other agencies in the United States and abroad are not as advanced. Legal challenges involved in data access are many; not only might assays be proprietary but data from nonproprietary assays might be considered confidential business information.
Recommendation: Data collected through coordinated validation or screening programs in government laboratories or under contract to government agencies, especially with respect to novel test methods, should be made publicly available as soon as possible, preferably through user-friendly Web-based dashboards. If data are subject to human-subject protections or raise privacy concerns, appropriate measures should be taken to de-identify the information that is being released.
Establishing Clear Reporting Standards for Assay Results and Testing Conditions
It is widely recognized that the level of detail on methods and experimental conditions reported in scientific publications can be limited by manuscript length restrictions and other factors. It is critical, however, that sufficient information be included in the documentation of assay- or model-validation exercises. It might appear to assay or model developers that some details are obvious and not needed in the documentation, but reproducibility and validity of results might be critically affected by the omission or incompleteness of information. Results might also be misinterpreted in application if incorrect inferences are drawn.
Recommendation: Government agencies and organizations involved in assay and model validation should develop clear guidance documents and training materials to support validation, such as training materials that cover various technical aspects of good in vitro method development and practices and cover reporting of methods. All technical aspects of the assay—such as number of cells; media, serum, or additives used; incubation length; readout description; equipment needed; and positive and negative controls—should be described as completely as possible and with the degree of detail needed for replication. The committee acknowledges that for proprietary reasons some information might need to be withheld, but best practice should include disclosure of the nature of and reason for withholding information.
Recommendation: Because the chemical or particle concentrations can be different from the administered (nominal or assumed) concentrations, depending on the chemical or particle properties (such as partitioning coefficients and metabolic rates) and the assay system (test materials), efforts should be made to quantify the concentrations in the test system that correspond with the response in the assays either through measurement or through mass-balance model estimation.
Establishing Clear Guidelines for Evaluating Data Integration and Computational Predictive Modeling in a Common Framework
In the 21st century toxicity-testing paradigm, the results of particular assays are likely to be integrated with data from other sources to obtain the most confident assessment of risk possible. Such integration is the topic of Chapter 7. In anticipation of that chapter, the committee addresses performance issues around models here.
The integrated analysis of data from multiple sources will be increasingly required for making regulatory decisions, and the collective use of these data can be viewed as a new, comprehensive “assay.” However, the multiple aspects of an integrated decision process present challenges in reliability and evaluation. The framework underlying integrated approaches to testing and assessment (OECD 2008) provides one example of a structured strategy for combining information for hazard identification and assessment. Here, the focus is on the quality and reliability of the computational aspects of data integration, which are often used in concert with traditional assays. Many of the validation principles of relevance and reliability that were developed for quantitative structure–activity relationship (QSAR) models by OECD (2007) apply to any statistical and integrated model (see Chapter 7 for further discussion). The OECD principles for QSAR model development call for (a) a defined end point, (b) an unambiguous algorithm, (c) a defined domain of (chemical) applicability, (d) measures of goodness of fit, robustness,
and predictivity, and ideally (e) a mechanistic interpretation. Items (b) and (d) often pose the greatest challenge for QSAR or any statistical model, in that complicated modeling schemes are often difficult to reproduce precisely. It has also been recognized and confirmed through systematic reviews of external validation studies of multivariable prediction models that most studies report key details poorly and lack clarity on whether validation was truly external to the information on which the model was based (Collins et al. 2014). Recent efforts by the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) initiative resulted in recommendations for the reporting of studies that develop, validate, or update a prediction model, whether for diagnostic or prognostic purposes (Collins et al. 2015).
Integrated assessment strategies can also benefit from redundancies and weighting of similar assays because a single in vitro assay will probably not provide a “perfect” result. Even assays that are similar mechanistically will likely have some degree of discordance because biological processes are complex, and some test chemicals might be unsuitable for certain assays. In addition, many environmental chemicals are likely to have low potency. As a result, there will be variation from assay to assay in what would be considered a positive response. Multiple assays for critical targets are likely to be needed and can be combined by using computational models (Browne et al. 2015). Any weighting scheme that is data-driven should be carefully cross-validated to avoid optimistic or over-fitted final schemes.
As noted, data from assays might be combined with other lines of data to guide decision-making, and issues of documentation and transparency that arise when assay data are combined are similar to those involved when data from a single assay are used.
Recommendation: Technical aspects of a statistical predictive model should be described with enough detail for all major steps to be independently reproduced and to ensure the utility and reliability of the predictive models. Statistical predictive models often result in implicit weighting schemes for various features, such as chemical descriptors in QSAR models. Where possible, the final features used and relative model contributions should be published to open the “black box” for future investigators.
Recommendation: Weighting schemes for combining assays should be cross-validated if predictive performance or another criterion is driven by the current data and is used in developing a scheme.
Recommendation: A culture of independent reproduction of statistical and integrative models should be fostered, ideally with reliability of models assessed by multiple computational groups working independently.
Recommendation: Software tools and scripts should be validated by duplicative review by multiple investigators, and where possible software should be made available by open-source mechanisms for continual quality control.
Balls, M. 1995. Defining the role of ECVAM in the development, validation and acceptance of alternative tests and testing strategies. Toxicol. In Vitro 9(6):863-869.
Birnbaum, L.S. 2013. 15 years out: Reinventing ICCVAM. Environ. Health Perspect.121(2):A40.
Brown, N.A. 2002. Selection of test chemicals for the ECVAM international validation study on in vitro embryotoxicity tests. European Centre for the Validation of Alternative Methods. Altern. Lab. Anim. 30 (2):177-198.
Browne, P., R.S. Judson, W.M. Casey, N.C. Kleinstreuer, and R.S. Thomas. 2015. Screening chemicals for estrogen receptor bioactivity using a computational model. Environ. Sci. Technol. 49(14):8804-8814.
Burden, N., C. Mahony, B.P. Müller, C. Terry, C. Westmoreland, and I. Kimber. 2015. Aligning the 3Rs with new paradigms in the safety assessment of chemicals. Toxicology 330:62-66.
Casati, S., P. Aeby, I. Kimber, G. Maxwell, J.M. Ovigne, E. Roggen, C. Rovida, L. Tosti, and D. Basketter. 2009. Selection of chemicals for the development and evaluation of in vitro methods for skin sensitisation testing. Altern. Lab. Anim. 37(3):305-312.
Casey, W.M. 2016. Advances in the development and validation of test methods in the United States. Toxicol. Res. 32(1):9-14.
Coecke, S., M. Balls, G. Bowe, J. Davis, G. Gstraunthaler, T. Hartung, R. Hay, O.W. Merten, A. Price, L. Schechtman, G. Stacey, and W. Stokes. 2005. Guidance on good cell culture practice: A report of The Second ECVAM Task Force on Good Cell Culture Practice. ATLA 33(3):261-287.
Collins, G.S., J.A. de Groot, S. Dutton, O.Omar, M. Shanyinde, A. Tajar, M. Voysey, R. Wharton, L.M. Yu, K.G. Moons, and D.G. Altman. 2014. External validation of multivariable prediction models: A systematic review of methodological conduct and reporting. BMC Med. Res. Methodol. 14:40.
Collins, G.S., J.B. Reitsma, D.G. Altman, and K.G. Moons. 2015. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD statement. J. Clin. Epidemiol. 68(2):134-143.
Cunningham, M.L. 2002. A mouse is not a rat is not a human: Special differences exist. Toxicol. Sci. 70(2):157-158.
EC (European Commission). 2008. Council Regulation (EC) No 440/2008 of 30 May 2008 laying down test methods pursuant to Regulation (EC) No 1907/2006 of the European Parliament and of the Council on the Registration,
Evaluation, Authorisation and Restriction of Chemicals (REACH). OJEU 51(L142):1-739.
EC (European Commission). 2016a. Validation and Regulatory Acceptance. Joint Research Centre [online]. Available: https://eurl-ecvam.jrc.ec.europa.eu/validation-regulatory-acceptance [accessed January 3, 2017].
EC (European Commission). 2016b. EURL ECVAM Genotoxicity and Carcinogenicity Consolidated Database of AMES Positive Chemicals. Joint Research Centre [online]. Available: https://eurl-ecvam.jrc.ec.europa.eu/databases/genotoxicity-carcinogenicity-db [accessed October 24, 2016].
Eskes, C., T. Cole, S. Hoffmann, A. Worth, A. Cockshott, I. Gerner, and V. Zuang. 2007. The ECVAM international validation study on in vitro tests for acute skin irritation: Selection of test chemicals. Altern. Lab. Anim. 35(6):603-619.
Gocht, T., and M. Schwarz, eds. 2013. Implementation of the Research Strategy [online]. Available: http://www.detect-iv-e.eu/wp-content/uploads/2013/09/SEURAT-1v3_LD.pdf [accessed January 3, 2017].
Hartung, T. 2007. Food for thought …on validation. ALTEX 24(2):67-80.
Hartung, T., S. Bremer, S. Casati, S. Coecke, R. Corvi, S. Fortaner, L. Gribaldo, M. Halder, S. Hoffmann, A.J. Roi, P. Prieto, E. Sabbioni, L. Scott, A. Worth, and V. Zuang. 2004. A modular approach to the ECVAM principles on test validity. ATLA 32(5):467-472.
IOM (Institute of Medicine). 2010. Evaluation of Biomarkers and Surrogate Endpoints in Chronic Disease. Washington, DC: The National Academies Press.
Judson, R., R. Kavlock, M. Martin, D. Reif, K. Houck, T. Knudsen, A. Richard, R.R. Tice, M. Whelan, M. Xia, R. Huang, C. Austin, G. Daston, T. Hartung, J.R. Fowle, III, W. Wooge, W. Tong, and D. Dix. 2013. Perspectives on validation of high-throughput assays supporting 21st century toxicity testing. ALTEX 30(1):51-56.
Kavlock, R.J., C.P. Austin, and R.R. Tice. 2009. Toxicity testing in the 21st century: Implications for human health risk assessment. Risk Anal. 29(4):485-487.
Kinsner-Ovaskainen, A., G. Maxwell, J. Kreysa, J. Barroso, E. Adriaens, N. Alépée, N. Berg, S. Bremer, S. Coecke, J.Z. Comenges, R. Corvi, S. Casati, G. Dal Negro, M. Marrec-Fairley, C. Griesinger, M. Halder, E. Heisler, D. Hirmann, A. Kleensang, A. Kopp-Schneider, S. Lapenna, S. Munn, P. Prieto, L. Schechtman, T. Schultz, J.M. Vidal, A. Worth, and V. Zuang. 2012. Report of the EPAAECVAM workshop on the validation of Integrated Testing Strategies (ITS). Altern. Lab. Anim. 40(3):175-181.
NIEHS (National Institute of Environmental Health Sciences). 1997. Validation and Regulatory Acceptance of Toxicological Test Methods: A Report of the ad hoc Interagency Coordinating Committee on the Validation of Alternative Methods. NIH Publication No. 97-3981. NIEHS, Research Triangle Park, NC [online]. Available: https://ntp.niehs.nih.gov/iccvam/docs/about_docs/validate.pdf [accessed July 29, 2016].
NIEHS (National Institute of Environmental Health Sciences). 2003. ICCVAM Guidelines for the Nomination and Submission of New, Revised, and Alternative Test Methods. NIH Publication No. 03-4508. Prepared by the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) and the National Toxicology Program (NTP) Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM) [online]. Available: https://ntp.niehs.nih.gov/iccvam/suppdocs/subguidelines/sd_subg034508.pdf [accessed July 29, 2016].
OECD (Organisation for Economic Co-operation and Development). 2005. Guidance Document on the Validation of and International Acceptance of New or Updated Test Methods for Hazard Assessment. ENV/JM/MONO (2004)14. OECD Series on Testing and Assessment No. 34. Paris: OECD [online]. Available: http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mono(2005)14&doclanguage=en [accessed July 29, 2016].
OECD (Organisation for Economic Co-operation and Development). 2007. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationships [(Q) SAR] Models. ENV/JM/MONO(2007)2. OECD Series on Testing and Assessment. Paris: OECD [online]. Available: http://www.oecd.org/env/guidance-document-on-the-validation-of-quantitative-structure-activity-relationship-q-sar-models-9789264085442-en.htm [accessed July 29, 2016]
OECD (Organisation for Economic Co-operation and Development). 2008. Guidance Document on Magnitude of Pesticide Residues in Processed Commodities. ENV/JM/MONO(2008)23. OECD Series on Testing and Assessment No. 96. Paris: OECD [online]. Available: http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mono(2008)23&doclanguage=en [accessed July 29, 2016].
OECD (Organisation for Economic Co-operation and Development). 2014. Guidance Document for Describing Non-guideline in Vitro Test Methods. ENV/JM/MONO(2014)35. OECD Series on Testing and Assessment No. 211. Paris: OECD [online]. Available: http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=ENV/JM/MONO(2014)35&doclanguage=en [accessed July 29, 2016].
Patlewicz, G., T. Simon, K. Goyak, R.D. Phillips, J.C. Rowlands, S.D. Seidel, and R.A. Becker. 2013. Use and validation of HT/HC assays to support 21st century toxicity evaluations. Regul. Toxiol. Pharmacol. 65(2):259-268.
Pazos, P., C. Pellizzer, T. Stummann, L. Hareng, and S. Bremer. 2010. The test chemical selection procedure of the European Centre for the Validation of Alternative Methods for the EU Project ReProTect, Reprod. Toxicol. 30(1):161-199.
Pirone, J.R., M. Smith, N.C. Kleinstreuer, T.A. Burns, J. Strickland, Y. Dancik, R. Morris, L.A. Rinckel, W. Casey, and J.S. Jaworska. 2014. Open source software implementation of an integrated testing strategy for skin sensitization potency based on a Bayesian network. ALTEX 31(3):336-340.
Rutjes, A.W., J. B. Reitsma, A. Coomarasamy, K.S. Khan, and P.M. Bossuyt. 2007. Evaluation of diagnostic tests when there is no gold standard. A review of methods. Health Technol. Assess. 11(50):iii, ix-51.
Thomas, R.S., M.B. Black, L. Li, E. Healy, T.M. Chu, W. Bao, M.E. Andersen, and R.D. Wolfinger. 2012. A comprehensive statistical analysis of predicting in vivo hazard using high-throughput in vitro screening. Toxicol. Sci. 128(2):398-417.
Tice, R.R., C.P. Austin, R.J. Kavlock and J.R. Bucher. 2013. Improving the human hazard characterization of chemicals: A Tox21 update. Environ. Health Perspect. 121(7):756-765.