Gene Expression Profiles of Breast Cancer

Published: 21-12-2017| Version 1 | DOI: 10.17632/v3cc2p38hb.1
Haozhe Xie,
Jie Li,
Tim Jatkoe,
Christos Hatzis


The published dataset consists of four sperate datasets: - BC-TCGA consists of 17,814 genes and 590 samples (including 61 normal tissue samples and 529 breast cancer tissue samples). - GSE2034 includes 12,634 genes and 286 breast cancer samples (including 107 recurrence tumor samples and 179 no recurrence samples). - GSE25066 has 492 breast cancer samples available (including 100 pathologic complete response (PCR) samples and 392 residual disease (RD) samples) and 12,634 genes. - Simulation Data includes 100 positive samples and 100 negative samples with 10,000 features, and each feature in SData follows normal distributions: N(0, 0.1) and N(0 ± r, 0.1) for positive and negative samples, respectively, where r ∈ [−0.125, 0.125]. All of the datasets are used in the experiments in the paper (Comparison among dimensionality reduction techniques based on Random Projection for cancer classification, Xie et al., 2016).


