tify differentially expressed genes under different treatment conditions. We will examine important sources of variation and discuss some exploratory and inferential analyses. We will provide some examples to show how different scientific questions lead to different experimental designs and statistical hypotheses.
Microarray measures the expression abundance of essentially all the genes in a genome. With thousands of measurements and relatively few subjects (tens, rarely over a hundred), any difference in the conditions of the subjects to be tested would cause a large number of genes to show expression changes. Therefore, it is very important to understand and control different sources of variability in microarray experiments. Typically there are two sources of random variability: biologic variation and technical variation. The biologic variation exists in the tested subjects. Sometimes it is possible to reduce the biologic variation, e.g., by using more homogenous individuals. Technical variation lies in the sample preparation and the microarray technology itself, including tissue collecting, RNA isolation, labeling, chip hybridization, etc. As the microarray technology matures, the chip-to-chip variation decreases. Still, large variation is observed during the sample preparation. For example, different labeling kits could lead to a big difference in the expression signals. Even with the same labeling kit, samples processed on different days may yield very different signals.
A related issue is the pooling of RNA samples from animals. During the early discovery phase when resources are more limited, pooling of the samples from animals in the same group may be necessary. An example of this is for establishing a surrogate assay to screen peroxisomal proliferation activated receptors (PPAR). While pooling samples allows researchers to use fewer chips, it loses the ability to measure individual expression and provide the estimation of biologic variation for proper statistical tests. It may be advantageous if samples are very cheap and easy to obtain (like in cell cultures), and a partial pooling is conducted in which samples from the same treatment are pooled to form multiple independent pools; thus the biologic variation can still be properly estimated. In general, for follow-up evaluation of observed toxicity, RNA samples from animals are not pooled. For example, upon observing heart weight changes or skeletal muscle necrosis in rats, microarray technology is applied to help find biomarkers and one genechip is used for each animal.
Another important issue is to avoid potential bias by randomization and/or proper blocking. For example, the processing batch effect may be significant. Two samples can appear very different if processed on two