9
Validation

The utility of toxicogenomic technologies ultimately depends on how reliable, reproducible, and generalizable the results are from a particular study or individual method of analysis. Moving beyond laboratory assays to more widespread use requires some level of validation, which can be defined as the process of ensuring that a test reliably measures and reports the determined end point(s) and encompasses both technical and platform qualification in addition to biologic qualification. Distinct issues arise from the use of any novel technology in a regulatory context. As discussed in this chapter, validation is an integral part of the more general process of developing and applying toxicogenomic methodology.

LEVELS OF VALIDATION

Validation must be carried out at various levels as described in Box 9-1. First, technology platforms must be shown to provide consistent, reliable results, which includes assessment of device stability and determination of analytical sensitivity and assay limits of detection, interference, and precision (reproducibility and repeatability). Second, the software used to collect and analyze data for an application must provide valid results. Third, the application, consisting of both hardware and software, must be tested and validated in the context of the biologic system to which it will be applied. Fourth, the application, or a related application based on the original, must be shown to be generalizable to a broader population or to be highly specific for a smaller, target population. Finally, one must consider how these technologies and applications based on them can be validated for regulatory use. These five levels of validation are discussed in this chapter.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 135
9 Validation The utility of toxicogenomic technologies ultimately depends on how reli- able, reproducible, and generalizable the results are from a particular study or individual method of analysis. Moving beyond laboratory assays to more wide- spread use requires some level of validation, which can be defined as the process of ensuring that a test reliably measures and reports the determined end point(s) and encompasses both technical and platform qualification in addition to bio- logic qualification. Distinct issues arise from the use of any novel technology in a regulatory context. As discussed in this chapter, validation is an integral part of the more general process of developing and applying toxicogenomic method- ology. LEVELS OF VALIDATION Validation must be carried out at various levels as described in Box 9-1. First, technology platforms must be shown to provide consistent, reliable results, which includes assessment of device stability and determination of analytical sensitivity and assay limits of detection, interference, and precision (reproduci- bility and repeatability). Second, the software used to collect and analyze data for an application must provide valid results. Third, the application, consisting of both hardware and software, must be tested and validated in the context of the biologic system to which it will be applied. Fourth, the application, or a related application based on the original, must be shown to be generalizable to a broader population or to be highly specific for a smaller, target population. Finally, one must consider how these technologies and applications based on them can be validated for regulatory use. These five levels of validation are discussed in this chapter. 135

OCR for page 135
136 Applications of Toxicogenomic Technologies BOX 9-1 Validation of Toxicogenomic Applications 1. Platform validation: Does the particular technology provide reproduci- ble and reliable measurements? 2. Software/data analysis validation: Is the software used for analysis of a particular experimental design appropriate and does it provide insight into the biology of the problem under study? 3. Biologic validation: Are the results from an “-omics” analysis consistent with the biology or can they be verified by another focused approach such as quan- titative reverse transcriptase polymerase chain reaction (qRT-PCR) for microar- rays or enzyme-linked immunosorbent assay (ELISA)1 for proteomics? 4. Generalizability: Can the results of a particular analysis be extended from the test samples to the broader population or from animal models to humans? 5. Regulatory validation: Is a particular assay or test suitable for use in evaluating the safety and efficacy of new compounds or in diagnostic or prognos- tic applications? It is important to recognize that validation is an iterative process, so that, for example, the biologic validation step can refine platform and software validation and help direct efforts to generalize the results. Platform Validation Any toxicogenomic study is predicated on the assumption that the tech- nologies provide accurate and relevant measures of the biologic processes un- derlying what is being assayed. For transcriptomic profiles with microarrays, for which we have the most data, there have been many successful applications, often with high rates of validation using an alternative technology such as Northern analysis or quantitative reverse transcriptase polymerase chain reaction (qRT-PCR); however, it should be noted that each of these techniques has ex- perimental biases. The issue of concordance between different microarray plat- forms was discussed in Chapter 2. However, recent reports suggest that adher- ence to good, standard laboratory practices and careful analysis of data can lead to high-quality, reproducible results in which the biology of the system under study drives the gene expression profiles that are observed (Bammler et al. 2005; Dobbin et al. 2005; Irizarry et al. 2005; Larkin et al. 2005). Similar efforts 1 ELISA is a quantitative in vitro test for an antibody or antigen in which the test mate- rial is adsorbed on a surface and exposed to either a complex of an enzyme linked to an antibody specific for the antigen or an enzyme linked to an anti-immunoglobulin specific for the antibody followed by reaction of the enzyme with a substrate to yield a colored product corresponding to the concentration of the test material. (Merriam-Webster’s Medical Dictionary, http://dictionary.reference.com/browse/Enzyme-Linked Immunosor- bent Assay, [Accessed: April 12, 2007]).

OCR for page 135
137 Validation must accompany the adoption of various genomic, proteomic, and metabolomic technology platforms for toxicogenomics. This process of technology platform assessment is an essential step in the overall validation process and indicates whether a system provides a reliable and reproducible measure of the biology under study. Two often-confused measures of system performance are repeatability and reproducibility. Repeatability de- scribes the agreement of successive measurements when controllable sources of variation are held constant. If one or more of these sources of variation is al- lowed to have its typical effect, the agreement is called reproducibility. Repeatability can be assessed by consecutive measurements of a single sample at one time under identical conditions. Repeated measurements with the same method but conducted on different days, with different batches of reagents, or with different operators provide a measure of reproducibility. Because the latter scenario best describes the routine application of toxicogenomic technol- ogy platforms, reproducibility is the most relevant measure of system perform- ance. Assays of reproducibility involve analyzing the same biologic sample multiple times to determine whether the platform provides consistent results with a small coefficient of variation. Although this may seem straightforward, toxicogenomic technologies do not measure single quantities but represent hun- dreds or thousands of measurements—one each for many genes, proteins, or metabolites. Assays optimized for one range of expression level or type of ana- lyte may not perform as well with other samples. For example, a technology that performs well for genes expressed at high levels may not be sensitive to low levels of expression. Careful assessment of the reproducibility of measurements and the relative signal-to-noise ratio in the assay must be evaluated with an em- phasis on relevant levels of gene, protein, or metabolite expression for a specific application. This is particularly true of proteomics and metabonomics, in which the range of analyte concentrations may vary by more than a millionfold (Figure 9-1). Types of Calibration Standards Another approach that can provide some level of quality assessment and quality control is the use of calibration standards that consist of complex mix- tures of analytes spanning the dynamic range normally surveyed in a particular application. In the context of microarray gene expression analysis, the develop- ment of “universal” RNA reference samples (Cronin et al. 2004) is under way and the External RNA Control Consortium (ERCC), led by the National Institute of Standards and Technology, is moving toward defining such a standard. The ERCC is composed of representatives from the public, private, and academic sectors working together in a consensus fashion to develop tools for experiment control and performance evaluation for gene expression analysis, including

OCR for page 135
138 Applications of Toxicogenomic Technologies FIGURE 9-1 Human plasma proteome. The large range of protein concentrations in the human proteome represents a significant experimental challenge as technologies must be sensitive across nearly 12 orders of magnitude (a 1 trillionfold range) for comprehensive analysis and the development of biomarkers. Source: Anderson and Anderson 2002. Re- printed with permission; 2002, Molecular & Cellular Proteomics.

OCR for page 135
139 Validation “spike-in” controls, protocols, and informatic tools—all intended to be useful for one- and two-color microarray platforms and qRT-PCR. Ideally, such an RNA “standard” would consist of multiple samples. The first would consist of one or more RNA mixtures that could be used for regular quality control assessment. This approach can be used to monitor the perform- ance of a particular laboratory or platform to document that the results obtained remained consistent both in their ability to detect expression measures for any one sample and their ability to detect differential expression among samples. A second useful control RNA would consist of exogenous spike-in controls (van de Peppel et al. 2003), which correspond to probes on the microarray surface that are not from the species being analyzed. This control measures system per- formance independent of the quality of the RNA sample being analyzed. Objec- tive measures of RNA quality may provide an additional means of assessing the performance and establishing the credibility of a particular microarray assay, as poor quality RNA samples provide unreliable results. Finally, as the primary measurement in microarray assays is the fluorescence intensity of individual hybridized probes, work is under way to establish quantitative standards to as- sess the performance of microarray scanning devices. Efforts at Standards Development Consortia approaches to standardization and validation have played an important role in working toward platform validation. As the field of toxicoge- nomics has matured, there has been a realization that groups working together can better understand and define the limitations of any technology and the po- tential solutions to any problems. Examples include the International Life Sci- ences Institute’s Health and Environmental Sciences Institute (ILSI-HESI 2006), a consortium of industry, government, and academic groups examining applica- tions of toxicogenomics, and the Toxicogenomics Research Consortium spon- sored by the National Institute of Environmental Health Sciences (NIEHS) (TRC 2005). The value of these consortium efforts is that they capture the state- of-the-art of multiple groups simultaneously and therefore have the potential to advance an adoptable standard much more quickly than can individual research groups. In response to the growing need for objective standards to assess the qual- ity of microarray assays, the Microarray Gene Expression Data Society (MGED) hosted a workshop on microarray quality standards in 2005. This workshop and its findings are described in Box 9-2. While early, informal efforts such as those offered by MGED are impor- tant in helping to define the scope of the problem and to identify potential ap- proaches, systematic development of objective standards for quality assessment would greatly facilitate the advancement and establishment of toxicogenomics as a discipline. Ideally, further efforts to establish objective and quantitative

OCR for page 135
140 Applications of Toxicogenomic Technologies BOX 9-2 Microarray Gene Expression Data Society Workshop, September 2005, Bergen, Norway The purpose of this workshop was to examine quality metrics that cut across technologies and laboratories: defined standards that can be used to evaluate vari- ous aspects of each experiment, including overall studies, the individual microar- rays used, and the reporters that are used to measure expression for each gene. Establishing such standards will allow data in public repositories to be better documented, more effectively mined and analyzed, and will provide an added measure of the confidence of the results in each study. The workshop examined several levels of quality assessment: 1. Quality measures based on external controls, including spike-in controls; 2. Indicators of common artifacts such as background levels, background inhomogeneity, and RNA degradation; 3. Quality metrics based on technical replicates; 4. Model-based quality metrics (using biologic replicates and statistical models of data distributions); and 5. Identification of potential bias in individual measures and evaluation of data. The consensus was that the diversity in platforms, experimental designs, and applications makes it unlikely that a single universal measure of quality will be possible. However, there was confidence that standards based on universal principles could be developed for each platform—for example, one for Affymetrix GeneChips and a separate, but similar, standard for spotted oligonucleotide mi- croarrays. Basic principles applicable at different stages of any study were evident. First, external standards as spike-in controls provide a way to assess various steps of the analytical process and facilitate comparisons among laboratories and plat- forms; they also provide a way to assess the quality of experiments over time. Adding a control RNA from another species without similarity to the genome of the species under study, and for which there are probes on the microarray, can yield data that can be used to assess the overall quality of the starting RNA, the quality of the hybridization, and the general quality of a particular microarray. Second, as most microarrays contained repeated probe sequences for particular genes, they are quite useful for assessing spatial properties of each microarray and for assessing the overall performance of a single microarray as these repeated probes should give consistent measures of gene expression. Finally, the analysis of replicate samples was identified as one way to assess the quality of a particular study. This requires “technical replicates,” in which the starting material is measured multiple times and which provides an estimate of the overall variability of any assay. However, “biologic replicates,” in which separate experimental subjects from the same treatment group are assayed also provide a very useful and powerful assessment of gene expression, as they include estimates of variability in both the assay and the biologic samples under study but as such are not as useful in assessing quality.

OCR for page 135
141 Validation quality measures for microarray data and other toxicogenomic data will help to advance the field in much the same way that “phred quality scores” characterizing the quality of DNA sequences accelerated genome sequencing. Software/Data Analysis Validation The software used in analyzing data from toxicogenomic studies can play as significant a role in determining the final outcome of an experiment as the technology platform. Consequently, considerable attention must be paid to vali- dating the computational approaches. The combination of technology platform data collection and processing algorithms must be appropriately selected and validated for application to the biologic system for each study. Data Collection and Normalization Most genomic technology platforms do not perform absolute quantitative measurements. For microarray data collected on an Affymetrix GeneChip, the data from the multiple probe pairs for each gene are combined in various ways to assess an expression level for each gene. However, these analyses do not measure quantities of particular molecules directly; instead, they measure surro- gates such as fluorescence, which is subject to unknown sources of variation. These fluorescent signals are used to estimate expression levels. Similar proc- essing of the raw data is an element of all genomic technologies. For gene ex- pression-based assays, these initial measurements are often followed by a “nor- malization” process that adjusts the individual measurements for each gene in each sample to facilitate intersample comparison. Normalization attempts to remove systematic experimental variation in each measurement and to adjust the data to allow direct comparison of the levels of a single gene, protein, or me- tabolite across samples. Despite widespread use of image processing and data normalization in microarray analyses, the sources of background signals and how to best estimate their levels are not fully understood; thus, there is no uni- versally accepted standard for this process. Any image processing and normali- zation approach changes the data and affects the results of further analysis. The importance of appropriate methods is emphasized here. If the results of multiple studies are to be combined, every effort should be made to apply consistent methods to all data. As the number of microarray experiments grows, there is increasing interest in meta-analyses, which may provide more broadly based information than can be seen in a single experiment. It is important that repositories for toxicogenomic experiments make all “raw” data available, so they can be analyzed by consistent methodologies to extract maximum informa- tion.

OCR for page 135
142 Applications of Toxicogenomic Technologies Data Analysis Once the data from a particular experiment have been collected and nor- malized, they are often further analyzed by methods described in Chapter 3. Class discovery experiments typically use approaches such as hierarchical clus- tering to determine whether relevant subgroups exist in the data. Class prediction and classification studies, which link toxicogenomic pro- files to specific phenotypic outcomes represent a somewhat different validation challenge. Ideally, to validate a classification method, it is most useful to have an initial collection of samples (the training set) that can be analyzed to arrive at a profile and an appropriate classification algorithm as well as an independent group of samples (the test set) that can be used to verify the approach. In prac- tice, most toxicogenomic studies have a limited number of samples and all of them are generally necessary for identifying an appropriate classification algo- rithm (or classifier). An alternative to using an independent test set, albeit less powerful and less reliable, is to perform leave k out cross-validation (LKOCV) (Simon et al. 2003). This approach leaves out some subset k of the initial collec- tion of N samples, develops a classifier using the (N – k) samples that remain, and then applies the classification algorithm to k samples in the test set that were initially left out. This process is then repeated with a new set of k samples to be left out and classified and so on. The simplest variant, which is often used, is leave one out cross-validation (LOOCV). This cross-validation can be extremely useful when an independent test set is not available, but it is often applied inappropriately as a partial rather than a full cross-validation, the distinction being the stage in the process when one leaves k out. Many published studies with microarray data have used the entire dataset to select a set of classification genes and then divided the samples into k and (N – k) test and training sets. The (N – k) training samples are used to train the algorithm, which is tested on the k test samples. The problem is that using all the samples to select a classification set of genes has the potential to bias any classifier because the test and training sets are not independent. Such partial cross-validation should never be performed. The proper approach is to conduct full LKOCV in which the sample data are divided into training and test sets be- fore each round of gene selection, algorithm training, and testing. When iterated over multiple rounds, LKOCV can be used to estimate the accuracy of the clas- sification system by simply averaging the complete set of classifiers. However, even here, optimal validation of any classifier requires a truly independent test set. The choice of samples for training is an important, but often neglected, element in developing a classifier. It is important to balance representation of sample classes and to ensure that other factors do not confound the analysis. Nearly all algorithms work by a majority consensus rule. For example, if the data represent two classes, A and B, with eight in class A and two in class B, the simplest classifier would just assign everything to class A with 80% accuracy, a result that clearly is not acceptable. Samples should also be selected so that there

OCR for page 135
143 Validation are no confounding factors. For example, an experiment may be conducted to develop a classifier for hepatotoxic compounds. If all toxicant-treated animals were treated with one vehicle, whereas the control animals were treated with another vehicle, then differences between the treated and control groups may be confounded by the differences in vehicle response. The solution to this problem is a more careful experimental design focused on limiting confounding factors. Selecting a sample of sufficient size to resolve classes is also an important consideration (Churchill 2002, Simon et al. 2002, Mukherjee et al. 2003). Radich and colleagues recently illustrated one reason why this is so important; they analyzed gene expression levels in peripheral blood and demonstrated sig- nificant, but reproducible, interindividual variation in expression for a relatively large number of genes (Radich et al. 2004). Their study suggests that a small sample size may lead to biases in the gene, protein, or metabolite selection set because of random effects in assigning samples to classes. Biologic Validation and Generalizability Regardless of the goal of a particular experiment, its utility depends on whether its results can be biologically validated. Biologic validation is the proc- ess of confirming that a biologic change underlies whatever is detected with the technology platform and assigning a biologic context or explanation to an ob- served characteristic of a system. The results from any analysis of toxicoge- nomic data are generally considered a hypothesis that must be validated by more well-established and lower-throughput “standard” laboratory methods. A first step is often to verify the expression of a select set of genes in the original sam- ples by an independent technique. If the results can be shown to be consistent with results from an independent technique, further detailed study is often war- ranted. For example, upregulation of a gene transcript detected with a microar- ray may suggest activation of a specific signaling pathway—activation that, in addition to being confirmed by a change in the level of a corresponding protein, can be confirmed by measuring a change in another output regulated by the pathway. In a toxicogenomic experiment, in which thousands of genes, proteins, or metabolites are examined in a single assay, biologic validation is important because there is a significant likelihood that some changes in genes, proteins, or metabolites are associated with a particular outcome by chance. For mechanistic studies, biologic validation also requires clearly demon- strating a causative role for any proposed mechanism. For class discovery stud- ies in which new subgroups of compounds are identified, biologic validation also typically requires demonstrating some tangible difference, however subtle, among the newly discovered subgroups of compounds. For example, new paths to neurotoxicity may be inferred through transcriptome profiling when neuro- toxic compounds separate into groups based on the severity of the phenotype they cause, the time of onset of the phenotype, or the mechanism that produces

OCR for page 135
144 Applications of Toxicogenomic Technologies the phenotype. Finding such differences is important to establish the existence of any new classes. Generalizability addresses whether an observation from a single study can be extended to a broader population or, in the case of studies with animal mod- els, whether the results are similar across species. For mechanistic studies, gen- eralization requires verifying that the mechanism exists in a broader population. In class discovery, if results are generalizable, newly discovered classes should also be found when the sample studies are extended to include a broader, more heterogeneous population of humans or other species than those used in the ini- tial study. For classification studies, generalization requires a demonstration that, at the least, the set of classification genes and associated algorithms retain their predictive power in a larger, independent population and that, within cer- tain specific populations, the classification approach retains its accuracy (for example, see Box 9-3). Validation in a Regulatory Setting Toxicogenomic technologies will likely play key roles in safety evaluation and risk assessment of new compounds. Toxicogenomic technologies for this application must be validated by regulatory agencies, such as the Environmental Protection Agency (EPA), the Food and Drug Administration (FDA), and the Occupational Safety and Health Administration (OSHA), before they can be used in the decision-making process. The procedures by which regulatory vali- dation occurs have traditionally been informal and ad hoc, varying by agency, program, and purpose. This flexible validation process has been guided by gen- eral scientific principles and, when applied to a specific test, usually involves a review of the available experience with the test and an official or unofficial in- terlaboratory collaboration or “round robin” to evaluate the performance and reproducibility of the test (Zeiger 2003). Deciding whether to accept a particular type of data for regulatory pur- poses depends on more than such technical validation, however, and is also af- fected by a regulatory agency’s statutory mandate, regulatory precedents and procedures, and the agency’s priorities and resources. Because these factors are agency specific, validation is necessarily agency specific as well. The scientific development and regulatory use of toxicogenomic data will be facilitated by harmonization, to the extent possible, of data and method validation both among U.S. regulatory agencies and at the international level. However, harmonization should be a long-term goal and should not prevent individual agencies from ex- ploring their own validation procedures and criteria in the shorter term. Toxicogenomic data present unique regulatory validation challenges both because such data have not previously been used in a regulatory setting and be- cause of the rapid pace at which toxicogenomic technologies and data are devel- oping. Therefore, regulatory agencies must balance the need to provide criteria

OCR for page 135
145 Validation BOX 9-3 Clinical Validation of Transcriptome Profiling in Breast Cancer Although toxicogenomic technology applications are still in their infancy, they are being explored in clinical medicine. A notable example that illustrates the path from genomic discovery to biologic and clinical validation is the Netherlands breast cancer study (van’t Veer et al. 2002), which sought to distinguish between patients with the same stage of disease but different response to treatment and overall outcome. The investigators were motivated by the observation that the best clinical predictors for metastasis, including lymph node status and histologic grade, did not adequately predict clinical outcome. As a result, many patients receive chemotherapy or hormonal therapy re- gardless of whether they need the additional treatment. The goal of the analysis was to identify gene expression signatures that would help determine which pa- tients might benefit from adjuvant chemotherapy. By profiling gene expression in tumors from 117 young patients who had received only surgical treatment and evaluating correlations with clinical outcome, the authors identified 70 genes that compose a “poor prognosis” signature predictive of a short interval to distant me- tastasis in lymph-node-negative patients. Initial analysis demonstrated that microarray-based gene expression signa- tures could outperform any clinically based predictions of outcome in identifying patients who would benefit most from adjuvant therapy. These initial results moti- vated a more extensive follow-up study (van de Vijver et al. 2002) involving 295 patients (the initial 117 patients along with 178 who had been heterogeneously treated) that confirmed the advantage of the 70-gene classification profile relative to standard clinically and histologically based criteria. These promising results were independently demonstrated to have utility in a study involving 307 patients from five European centers (Buyse et al. 2006). It has been recognized, however, that a large-scale clinical trial is necessary to fully validate the efficacy of this gene expression signature in predicting outcome. The large collaborative MINDACT (Microarray in Node-Negative Disease May Avoid Chemotherapy Trial) conducted by the Breast International Group and coordinated by the European Organisation for the Research and Treatment of Cancer Breast Cancer Group will recruit 6,000 women with node-negative early-stage breast cancer to investigate the benefit-to-risk ratio of chemotherapy when the risk as- sessment based on standard clinicopathologic factors differs from that provided by the gene signature (Bogaerts et al. 2006). By extending the analysis to a broader population, the researchers involved will also have the opportunity to determine the extent to which patient heterogeneity, treatment protocols, variations in sample collection and processing, and other factors influence the value of the gene expres- sion signature as a diagnostic tool. and standardization for the submission of toxicogenomic data with the need to avoid prematurely “locking-in” transitory technologies that may soon be re- placed with the next generation of products or methods. Regulatory agencies have been criticized for being too conservative in adopting new toxicologic

OCR for page 135
146 Applications of Toxicogenomic Technologies methods and data (NRC 1994). Consistent with this pattern, such regulatory agencies as EPA and FDA to date have been relatively conservative in using toxicogenomic data, partly due to the lack of validation and standardization (Schechtman 2005). Although some caution and prudence against premature reliance on un- validated methods and data is appropriate, agencies can play a critical role and must actively encourage the deployment of toxicogenomic data and methods if toxicogenomic approaches are to be used to their fullest advantage. For exam- ple, FDA has issued a white paper describing a “critical path to new medical products” (FDA 2005b) that acknowledges the role of toxicogenomic technolo- gies in providing a more sensitive assessment of new compounds and suggests that new methods be developed to improve the process of evaluation and ap- proval. EPA and FDA have adopted initial regulatory guidances that seek to en- courage toxicogenomic data2 submissions (see Table 9-1 and Chapter 11). In March 2005, FDA issued a guidance for industry on submission of pharmacoge- nomic data (FDA 2005a). In that guidance, FDA states that “[b]ecause the field of pharmacogenomics is rapidly evolving, in many circumstances, the experi- mental results may not be well enough established scientifically to be suitable for regulatory decision making. For example: Laboratory techniques and test procedures may not be well validated” (FDA 2005a, p. 2). The FDA Guidance describes reporting requirements for “known valid” and “probable valid” biomarkers. A known valid biomarker is defined as “[a] biomarker that is measured in an analytical test system with well-established performance characteristics and for which there is widespread agreement in the medical or scientific community about the physiologic, toxicologic, pharma- cologic, or clinical significance of the results.” A probable valid biomarker is defined as “[a] biomarker that is measured in an analytical test system with well- established performance characteristics and for which there is a scientific framework or body of evidence that appears to elucidate the physiologic, toxi- cologic, pharmacologic, or clinical significance of the test results” (FDA 2005a, p. 17). The Guidance provides that “validation of a biomarker is context-specific and the criteria for validation will vary with the intended use of the biomarker. The clinical utility (for example, ability to predict toxicity, effectiveness or dos- ing) and use of epidemiology/population data (for example, strength of geno- type-phenotype associations) are examples of approaches that can be used to determine the specific context and the necessary criteria for validation” (FDA 2005a, p. 17). 2 “Pharmacogenomic” data and guidances are included in this table and discussion be- cause the term pharmacogenomic is often used to include data about the toxicity and safety of pharmaceutical compounds (referred to in this report as toxicogenomics) be- cause regulatory use and validation of other types of genomic data are relevant to toxico- genomics.

OCR for page 135
147 Validation TABLE 9-1 Worldwide Regulatory Policies and Guidelines Related to Toxicogenomics and Pharmacogenomics Region/Document Type Document Issue Date United States: Food and http://www.fda.gov Drug Administration Guidance Multiplex Tests for Heritable DNA April 21, 2003 Markers, Mutations and Expression Patterns; Draft Guidance for Industry and FDA Reviewers Guidance Guidance for Industry: Pharmacogenomic March 2005 Data Submissions Guidance Guidance for Industry and FDA Staff: March 10, 2005 Class II Special Controls Guidance Document: Drug Metabolizing Enzyme Genotyping System Concept paper Drug-Diagnostic Co-Development April 2005 Concept Paper (Preliminary Draft) Guidance Guidance for Industry and FDA Staff: August 25, 2005 Class II Special Controls Guidance Document: RNA Preanalytical Systems Environmental http://www.epa.gov Protection Agency Guidance Interim Genomics Policy June 2002 White paper Potential Implications of Genomics for December 2004 Regulatory and Risk Assessment Applications at EPA Europe: European Agency http://www.emea.eu.int for the Evaluation of Medicinal Products Position paper CPMP Position Paper on Terminology in November 21, 2002 Pharmacogenetics (EMEA/CPMP/ 3070/01) Guideline CHMP Guideline on Pharmacogenetics March 17, 2005 Briefing Meetings (EMEA/CHMP/ 20227/2004) Supplement Understanding the Terminology Used in July 29, 2004 Pharmacogenetics (EMEA/3842/04) (Continued on next page)

OCR for page 135
148 Applications of Toxicogenomic Technologies TABLE 9-1 Continued Region/Document Type Document Issue Date Concept paper CHMP Concept Paper on the March 17, 2005 Development of a Guideline on Biobanks Issues Relevant to Pharmacogenetics (Draft) (EMEA/CHMP/6806/2005) Japan: Ministry of Health, http://www.mhlw.go.jp/english Labour, and Welfare Guidance Clinical Pharmacokinetic Studies of June 1, 2001 Pharmaceuticals (Evaluation License Division Notification No. 796) Guidance Methods of Drug Interaction Studies June 4, 2001 (Evaluation License Division Notification No. 813) Notification Submission to government agencies of March 18, 2005 data and related matters concerning the drafting of guidelines for the application of pharmacogenomics in clinical studies of drugs FDA also lists possible reasons why a probable valid biomarker may not have reached the status of a known valid marker including the following: “(i) the data elucidating its significance may have been generated within a single com- pany and may not be available for public scientific scrutiny; (ii) the data eluci- dating its significance, although highly suggestive, may not be conclusive; and (iii) independent verification of the results may not have occurred” (FDA 2005a, p. 17, 18). Although FDA outlines clear steps for sponsors to follow with regard to regulatory expectations for each type of biomarker, these classifications are not officially recognized outside FDA. In addition to this FDA guidance, a number of other regulatory policies and guidelines have been issued worldwide that cover topics related to pharma- cogenomics and toxicogenomics (see Table 9-1). Future efforts to harmonize the use and expectations for genomic data will provide value in reducing the current challenge pharmaceutical companies face in addressing guidances for different countries. EPA issued an Interim Policy on Genomics in 2002 to allow consideration of genomic data in regulatory decision making but stated that these data alone would be “insufficient as a basis for decisions” (EPA 2002, p. 2). The Interim Policy states that EPA “will consider genomics information on a case-by-case basis” and that “[b]efore such information can be accepted and used, agency review will be needed to determine adequacy regarding the quality, representa- tiveness, and reproducibility of the data” (EPA 2002, Pp. 2-3). The EPA is also in the process of standardizing data-reporting elements for new in vitro and in

OCR for page 135
149 Validation silico test methods including microarrays, using the Minimum Information About a Microarray Experiment criteria as a starting point. At the interagency level, Congress established a permanent Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) in 2000 to require that new and revised test methods be validated to meet the needs of federal agencies. The NIEHS, EPA, FDA, and OSHA are 4 of the 15 federal regulatory and research agencies participating in ICCVAM. Europe has created a similar validation organization called the European Centre for the Validation of Alternative Methods (ECVAM). At the international level, the Organization for Economic Co-operation and Development has also adopted formal guidelines to validate test methods for use in regulatory decision making (OECD 2001). The ICCVAM criteria are useful guides for regulatory validation of toxi- cogenomic data and methods, but toxicogenomic technologies will require unique and more flexible approaches to validation given their rapid pace of change and other unique characteristics (Corvi et al. 2006). To that end, ICCVAM and ECVAM are developing a unique and more flexible approach to validating toxicogenomic test methods for regulatory use, and they have con- vened a series of workshops on the topic (Corvi et al. 2006). One approach be- ing investigated is a “modular” model, in which different steps in the validation process are independently undertaken, in contrast to the traditional stepwise “linear” model that may unduly delay the validation of rapidly evolving toxico- genomic technologies (Corvi et al. 2006). Agencies such as the EPA are care- fully tracking and participating in this initiative and anticipate applying the out- put of the ICCVAM process in their own regulatory programs (EPA 2004). CONCLUSIONS Toxicogenomics has reached the stage where many of the initial technical questions have been resolved, at least for the more mature approaches such as gene expression analysis with microarrays. The community has learned that careful experiments using genomic approaches can provide results that are both comparable among laboratories and reveal insight into the biology of the system under study (Bammler et al. 2005; Irizarry et al. 2005; Larkin et al. 2005). How- ever, the need for standards for assessing the quality of particular experiments remains, and it will affect the utility of datasets that are and will be generated. The work of the ERCC (and other groups) to develop RNA standards (Cronin et al. 2004) is a potentially important component of this effort and should be en- couraged and continued, but additional work and development are necessary if truly useful quality assessment standards are to be created. Standard develop- ment efforts should not be limited to gene expression microarray analysis, as similar standards will be necessary if other toxicogenomic technologies are to be widely used and trusted to give reliable results.

OCR for page 135
150 Applications of Toxicogenomic Technologies Beyond quality control of individual experiments, more extensive valida- tion is needed to use toxicogenomic data for the applications discussed in this report. Most toxicogenomic projects have focused on limited numbers of sam- ples with technologies, such as DNA microarrays, that may not be practical for large-scale applications that go beyond the laboratory. Consequently, validation of toxicogenomic signatures should focus not only on the primary toxicoge- nomic technology (such as DNA microarrays) but also on assays that can be widely deployed (such as qRT-PCR) at relatively low cost. Many issues associ- ated with validation of toxicogenomic signatures will rely on the availability of large, accessible, high-quality datasets to evaluate the specificity and sensitivity of the assays. Those datasets must include not only the primary data from the assays but also the ancillary data about treatments and other factors necessary for analysis. This argues for the creation and population of a public data reposi- tory for toxicogenomic data. A means of regulatory validation of toxicogenomic applications is needed—for example, for toxicogenomic data accompanying new submissions of drug candidates for approval. Specifically, the development of new standards and guidelines that will provide clear, dynamic, and flexible criteria for the ap- proval and use of toxicogenomic technologies is needed at this point. Develop- ment of these standards and guidelines requires suitable datasets. For example, the use of toxicogenomics to classify new compounds for their potential to pro- duce a specific deleterious phenotype requires a useful body of high-quality, well annotated data. The existing ICCVAM approaches do not provide guidance for the large-scale toxicogenomic approaches being developed, as evidenced by the different guidelines the FDA and the EPA are developing, and toxicoge- nomic tools need not be subject to ICCVAM protocols before they are consid- ered replacement technologies. Although multiagency initiatives, such as the one ICCVAM is spearheading, may serve as a basis for establishing the needed standards and criteria in the longer term, the overall ICCVAM approach does not seem well suited for validating new technologies and as such will need to be significantly revised to accommodate the tools of toxicogenomics. Conse- quently, regulatory agencies such as the EPA and the FDA should move forward expeditiously in continuing to develop and expand their validation criteria to encourage submission and use of toxicogenomic data in regulatory contexts. In summary, the following are needed to move forward in validation: • Objective standards for assessing quality and implementing quality control measures for the various toxicogenomic technologies; • Guidelines for extending technologies from the laboratory to broader applications, including guidance for implementing related but more easily de- ployable technologies such as qRT-PCR and ELISAs; • A clear and unified approach to regulatory validation of “-omic” tech- nologies that aligns the potentially diverse standards being developed by various federal agencies, including the EPA, FDA, NIEHS, and OSHA, as well as at-

OCR for page 135
151 Validation tempting to coordinate standards with the relevant European and Asian regula- tory agencies. • A well-annotated, freely accessible, database providing access to high- quality “-omic” data. RECOMMENDATIONS The following specific actions are recommended to facilitate technical validation of toxicogenomic technologies: 1. Develop objective standards for assessing sample and data quality from different technology platforms, including standardized materials such as those developed by the ERCC. 2. Develop appropriate criteria for using toxicogenomic technologies for different applications, such as hazard screening and exposure assessment. 3. Regulatory agencies should establish clear, transparent, and flexible criteria for the regulatory validation of toxicogenomic technologies. Whereas the use of toxicogenomic data will be facilitated by harmonization of data and method validation criteria among U.S. regulatory agencies and at the interna- tional level, harmonization should be a long-term goal and should not prevent individual agencies from developing their own validation procedures and criteria in the shorter term.