README: for NCI 60 data ======= We would like to thank Dr. Jane Fridlyand (from UCSF) for providing us with the processed NCI 60 data. Please refer to Dudoit et al. 2002 (JASA 97: 77 -- 87) for details of pre-processing. The paper is also available as a technical report at: http://www.stat.berkeley.edu/~sandrine/tecrep/576.pdf *The following files are in the correct format for our Java code. We randomly partitioned the original data with 61 samples into a training set with 43 samples and a test set with 18 samples. Training set: train_NCI60_5244_43_1.txt - consists of 5244 genes and 43 experiments, spanning 8 different cancer cell lines. - The rows represent the genes. - Each column represent an experiment. The type of cell line is indicated in the header row. - Unfortunately, the gene names are not available from the pre-processed data, so we made up unique identifiers for the names (ie. Gene_XX). - Note that repeated measurements are not available (in general) for this data. So, each entry represents a log expression ratio. Training set: test_NCI60_5244_18_1.txt - consists of 5244 genes and 18 experiments, spanning 8 different cancer cell lines. - in a similar format as the training set Classes: NCI60_8class.txt - 8 classes - This file specifies the name of each of the 8 classes, and is required by our ewusc software. Label for each experiment in the training set: NCI60_label43.txt - total 43 experiments - This file specifies the label for each experiment, and is required by our ewusc software. Label for each experiment in the test set: NCI60_label18.txt - total 18 experiments - This file specifies the label for each experiment, and is required by our ewusc software.