Furthermore, in my opinion, previous studies had at least one of these problems: (1) precision/accuracy were not properly assessed (that is, the sensitivity/specificity trade-off was not considered and in general, assessments were based on validation of few genes). (2) There was not an a-priori expectation of truth. In general, RTPCR was considered the gold standard and measurements from that technology considered the “truth.” (3) The effect of preprocessing was not explored. (4) As mentioned, the lab effect not explored.
Together with various labs from District of Columbia/Baltimore area, that volunteered their time and materials, I conducted a study for comparing microarray technologies. To overcome the problems of previous studies, we followed methodology that can be summarized by the following steps: (1) We included platforms for which results from at least two labs were available. (2) To avoid a transportation effect, we considered only labs in DC/Baltimore area. Of those we asked, five Affymetrix labs, three two-color cDNA labs, and two two-color oligo labs agreed. (3) We send each lab technical replicates of two RNA samples. (4) In the samples sent to each lab, we included technical replicates of each of the two samples. This permitted us to assess precision. (5) We designed the two RNA samples to induce a-priori knowledge of differential expression of four genes. This permitted us to assess accuracy. (6) Finally, to provide more power to the assessment of accuracy we measured fold-changes for 16 strategically chosen genes. Details are available from the Nature Methods publication (Irizarry et al. 2005).
In our study we evaluated what we consider to be the basic measurement obtained from microarrays: relative expression in the form of log ratios. Thus, for each lab we had two replicate measures of relative expression: M1 = log(B1/A1) and M2 = log(B2/A2), with A1, A2, B1, B2 representing the two pairs of technical replicates provided to each lab.
The first important recommendation is that when comparing and/or combining measurements from different platforms one should look at relative as opposed to absolute measures of expression. This is because