Gene Expression Profiles of Breast Cancer

Published: 21 Dec 2017 | Version 1 | DOI: 10.17632/v3cc2p38hb.1
  • Haozhe Xie,
    Computer Science
    Harbin Institute of Technology
  • Jie Li,
    Jie Li
    Harbin Institute of Technology
  • Tim Jatkoe,
    Tim Jatkoe
    The submitter of the dataset GSE2034.
  • Christos Hatzis
    Christos Hatzis
    Nuvera Biosciences
    The submitter of the dataset GSE25066.

Description of this data

The published dataset consists of four sperate datasets:

  • BC-TCGA consists of 17,814 genes and 590 samples (including 61 normal tissue samples and 529 breast cancer
    tissue samples).
  • GSE2034 includes 12,634 genes and 286 breast cancer samples (including 107 recurrence tumor samples and 179 no recurrence samples).
  • GSE25066 has 492 breast cancer samples available (including 100 pathologic complete response (PCR) samples and 392 residual disease (RD) samples) and 12,634 genes.
  • Simulation Data includes 100 positive samples and 100 negative samples with 10,000 features, and each feature in SData follows normal distributions: N(0, 0.1) and N(0 ± r, 0.1) for positive and negative samples, respectively, where r ∈ [−0.125, 0.125].

All of the datasets are used in the experiments in the paper (Comparison among dimensionality reduction techniques based on Random Projection for cancer classification, Xie et al., 2016).

Experiment data files

Steps to reproduce

You can get the source code from the following URL:

Related links

Latest version

  • Version 1


    Published: 2017-12-21

    DOI: 10.17632/v3cc2p38hb.1

    Cite this dataset

    Xie, Haozhe; Li, Jie; Jatkoe, Tim; Hatzis, Christos (2017), “Gene Expression Profiles of Breast Cancer”, Mendeley Data, v1


Views: 1393
Downloads: 352


Nuvera Biosciences Inc, Harbin Institute of Technology


Molecular Biology, Breast Cancer, Gene Expression


CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?

You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.