README: for multiple-tumor data ======= We would like to thank Dr. Sridhar Ramaswamy (from the Whitehead Institute) for providing us with the raw CEL files for the multiple tumor data. We processed the CEL files with the RMA package in the BioConductor software, to extract the RMA measures and their associated standard errors. Since only a subset of the original CEL files are available, we used a subset of the original data in our experiments. *The following files are in the correct format for our Java code. Training set: combined_multiple_tumor_7129_96.txt - consists of 7129 probe sets from Affy Hu 6800 chips, and 96 experiments, spanning 11 different tumor types. - The rows represent the probe sets. - Each consecutive 2 columns represent an experiment and the associated standard errors obtained from the RMA package. Test set: test_multiple_tumor_7129_27.txt - consists of the same 7129 probe sets from Affy Hu 6800 chips, and 27 experiments - The rows represent the probe sets. - Each consecutive 2 columns represent an experiment and the associated standard errors obtained from the RMA package. Class file: multiple_11class.txt - 11 classes - This file specifies the name of each of the 11 classes, and is required by our ewusc software. Label for each experiment in the training set: multiple_label96.txt - total 96 experiments - This file specifies the label for each experiment, and is required by our ewusc software. Label for each experiment in the test set: multiple_label27.txt - total 27 experiments - This file specifies the label for each experiment, and is required by our ewusc software. Gene Lists: - The gene lists produced by EWUSC and USC at Delta = 5.6 and rho = 0.8 are provided. - The first column is the unique gene number (ie. order of the gene in the input file). - The second column is the name of the proble set. - The 3rd column can be ignored. - The 4th - 14th columns: shrunken centroids computed for each of the 11 classes (order specified in the class file "multiple_11class.txt") at Delta = 5.6. A high abs(shrunken centroid) value in class k means that the class centroid in class k is "significantly" different from the overall centroid.