We consider robust estimation of gene intensities from cDNA microarray data with repli
cates. Several statistical methods for estimating gene intensities from microarrays have been
proposed, but there has been little work on robust estimation of the intensities. This is
particularly relevant for experiments with replicates, because even one outlying replicate
can have a disastrous eŽect on the estimated intensity for the gene concerned. Because of
the many steps involved in the experimental process from hybridization to image analysis,
cDNA microarray data often contain outliers. For example, an outlying data value could
occur because of scratches or dust on the surface, imperfections in the glass, or
imperfections in the array production. We develop a Bayesian hierarchical model for
robust estimation of cDNA microarray intensities. Outliers are modeled explicitly
using a tdistribution, and our model also addresses classical issues such as design
effects, normalization, transformation, and nonconstant variance. Parameter estimation
is carried out using Markov Chain Monte Carlo.
The method is illustrated using two publicly available gene expression data sets. The
betweenreplicate variability of the intensity estimates is reduced by 64% in one case and by
83% in the other compared to raw log ratios. The method is also compared to the ANOVA
normalized log ratio, the removal of outliers based on Dixon's test, and the lowess
normalized log ratio, and the betweenreplicate variation is reduced by more than
55% relative to the best of these methods for both data sets.
We also address the issue of whether the image background should be removed when
estimating intensities. It has been argued that one should not do so because it increases
variability, while the arguments for doing so are that there is a physical basis for the im
age background, and that not doing so will bias the estimated logratios of differentially
expressed genes downwards. We show that the arguments on both sides of this debate are
correct for our data, but that by using our model one can have the best of both worlds: one
can subtract the background without greatly increasing variability.
