tistical hypotheses, and statistical analysis methods will be discussed in order. Visualization of the data and analysis results will be included. Conventional hypothesis testing on such a massive amount of data leads to severe multiplicity issues. Proper multiplicity adjustments for P values will be discussed to help distill massive amounts of information into useful information for each compound.
On an Affymetrix GeneChip, each gene (probe set) is represented by a set (11-20 pairs) of paired short oligonucleotides of 25-base long, called perfect match (PM) and mismatch (MM) oligos. PM oligos match the gene sequence exactly so after hybridization with labeled sample RNA, they reflect the expression signal, MMs have the same sequences as PMs except that the middle base is changed to its complementary nucleotide. MMs are designed to capture the non-specific hybridized signals, or background signals. There are dozens of algorithms to extract a robust signal intensity from these 11-20 pairs of PMs and MMs for each probeset (Cope et al. 2004). The three most commonly used are MAS 5 from Affymetrix (Affymetrix 2002), the robust multi-array average (RMA) by Irizarry et al. (2003), and the model-based expression index (MBEI) by Li and Wong (2001). It is still not settled as to which method is the best. The final choice is often up to the researcher’s personal preference. In this presentation, signals extracted using MAS 5 from Affymetrix are statistically analyzed.
Every experiment is designed to answer certain scientific questions. It is important that before conducting an experiment, the researchers define scientific questions and statisticians translate the scientific questions into statistical hypotheses and determine appropriate statistical analyses. It is also important to make sure that at the end of the experiment, appropriate and right amounts of data have been collected for statistical analyses and for answering the scientific questions. Considering the expensive price tag of microarray chips and the large amount of time and other resources needed to carry out these experiments, it would be unwise to have problematic designs that could not provide answers to the scientific questions. In most microarray experiments, the primary goal is to iden-