Gene Expression Profiles of Breast Cancer

Published: 21-12-2017| Version 1 | DOI: 10.17632/v3cc2p38hb.1
Haozhe Xie,
Jie Li,
Tim Jatkoe,
Christos Hatzis


The published dataset consists of four sperate datasets: - BC-TCGA consists of 17,814 genes and 590 samples (including 61 normal tissue samples and 529 breast cancer tissue samples). - GSE2034 includes 12,634 genes and 286 breast cancer samples (including 107 recurrence tumor samples and 179 no recurrence samples). - GSE25066 has 492 breast cancer samples available (including 100 pathologic complete response (PCR) samples and 392 residual disease (RD) samples) and 12,634 genes. - Simulation Data includes 100 positive samples and 100 negative samples with 10,000 features, and each feature in SData follows normal distributions: N(0, 0.1) and N(0 ± r, 0.1) for positive and negative samples, respectively, where r ∈ [−0.125, 0.125]. All of the datasets are used in the experiments in the paper (Comparison among dimensionality reduction techniques based on Random Projection for cancer classification, Xie et al., 2016).


Steps to reproduce

You can get the source code from the following URL: