Research Activities
Bioinformatics
Virology/Bacteriology
Centers
 - UW Toxico Genomics
 - UW - NHLBI Center
 - Center for Pathway Inference Software
Recent Publications
-- Nature Genetics - Integrating large-scale functional genomic data...

-- Nature Biotechnology - Min info spec for in situ hybridization...

-- Nature Biotechnology - Direct multiplexed measurement of gene expression...

-- J Virology - Attenuation of the type I interferon response...

-- J Virology - Independent and cooperative antiviral actions of beta interferon...

-- Carcinogenesis - Comparative genomics of susceptibility to mammary carcinogenesis...

-- OMICS - What is the best reference RNA?...

-- J Virology - Human rhinovirus attenuates the type I interferon response...

-- Science - Rhesus Macaque Genome Sequencing and Analysis Consortium...

-- Cell Microbiology - Hierarchical gene expressio profiles of HUVEC...
Bioinformatics
Bioinformatics
The focus of the Bumgarner group’s bioinformatics efforts is to create tools and databases to better analyze and represent high-throughput functional genomics data - in particular microarray data. The common themes of our work have been:
 1) To evaluate tools and methods to select the "best-of-breed" approaches
 2) To bring statistical considerations into higher order analyses (e.g. clustering and classification)
 3) To provide "rules-of-thumb" insights to researchers that help them better interpret their results or plan their experiments.
Analysis Tools
Our primary activities in the creation of data analysis tools during the past few years involve improving on clustering and classification algorithms. In particular, we have been focused on creating tools that take advantage of replicate measurements or error estimates. Cluster analysis is a common approach to the discovery of patterns in complex data sets. As it applies to microarray data it is typically used in an exploratory mode to identify genes that share common expression patterns over multiple experiments, experiments (or samples) that share common expression patterns over multiple genes or both. For a brief tutorial on cluster analysis see Dr. KaYee Yeung's presentation. Regardless of the method used for cluster analysis, it is important to consider whether objects (genes or experiments) that cluster together do so by chance - that is would the same cluster be created again with a replicate data set or a subset of the current data set. Prior to our work, a great deal of previous research focused on bootstrap or jack-knife approaches to evaluating the robustness of a given clustering result. That is, a given number of experiments (or genes) are left out, the cluster analysis is repeated and one looks to see if the relationships between genes (or experiments) are maintained. If one repeats this a number of times, the frequency with which a given relationship is maintained is an indication of the confidence one should have in that relationship. In most cases, if one had replicate measurements, only the average values were provided to the cluster analysis. When average values are fed to a clustering analysis, valuable information - the variability of each measurement - is lost.

Our work on cluster analysis takes advantage of repeated measurement either by using the variance as a weighting factor in the distance measure or, better yet, specifically modeling the repeated measures in Bayesian, model-based clustering approach. In addition, we tested a number of different clustering algorithms and distance measures on both real and synthetic data to carefully evaluate which methods are better able to recover known patterns in the data. We have also investigated how often genes that are co-expressed across a number of experiments are likely to be co-regulated by a common transcription factor. Finally, along similar lines, we have also developed classification tools that take advantage of repeated measures.

Title of Paper
Citation
Source Code
Clustering Gene Expression Data with Repeated Measurements Genome Biology 2003 4(5): R34 Website
Multiclass classification of microarray data with repeated measurements: application to cancer Genome Biology 2003: 4:R83 Website
Bayesian mixture model based clustering of replicated microarray data Bioinformatics 2004: May 22;20(8):1222-32 Website
From co-expression to co-regulation: how many microarray experiments do we need? Genome Biology 2004 5(7):R48 Website

In addition to the above stand-alone code, we have also adopted MeV as a vessel for distributing our methods. At present, the classification program we have developed is being added to MEV to provide a good GUI to our code. We have also modified MEV to allow it to connect to our in-house and to our public .
Databases
Our primary activities in database creation have been the initial design and creation of a local database for making expression data public and in the creation of the Host-Virus Expression Compendium database (HVEC). HVEC is still in the early stages of data collection and we are currently focused on populating this database with more host virus expression data. HVEC was the model around which we developed MEV-to-dB connectivity. Our current projects involve making the MEV-to-dB connectivity more generic so this tool can be used to analyze data from other databases - including ArrayExpress - link and local dB's in individual researchers' labs. We are also actively engaged in the validation of statistical methods code in both public and commercial software (you'd be surprised at the errors we find) and in the creation and validation of tools to automatically upload MAGE-ML from our databases to ArrayExpress.