Data for: Bioinformatics analysis of the genes involved in the extension of prostate cancer to adjacent lymph nodes by supervised and unsupervised machine learning methods: the role of SPAG1 and PLEKHF2

Published: 04-07-2020| Version 1 | DOI: 10.17632/fdb8f5hjyd.1
Jamal Shamsara,
Elham Shamsara


The present study aimed to identify the genes associated with the involvement of adjunct lymph nodes of patients with prostate cancer (PCa) and to provide valuable information for the identification of potential diagnostic biomarkers and pathological genes in PCa metastasis. The most important candidate genes were identified through several machine learning approaches including K-means clustering, neural network, Naïve Bayesian classifications and PCA with or without downsampling. In total, 21 genes positively associated with lymph nodes involvement were identified. Among them, nine genes have been identified in metastatic prostate cancer, six have been found in other metastatic cancers and four in other local cancers. The amplification of the candidate genes was evaluated in the other PCa data sets. Besides, we identified a validated set of genes involved in the PCa metastasis. The amplification of SPAG1 and PLEKHF2 genes were associated with decreased survival in patients with PCa. A TCGA dataset of Prostate Adenocarcinoma (TCGA, PanCancer Atlas) was retrieved from cBioPortal [7, 8]. RNA expression values had been standardized against the gene's expression distribution in a reference population and had been reported as log2 values. CNA data had been reported as +2 , +1, 0, -1 or -2. We initially performed the analyses on the RNA data and then used the CNA data for further validation. The samples had been assigned as either N1 or N0 groups (Figure 2). The N1 group included the samples from the patients with PCa with the involvement of lymph nodes whereas N0 group included the samples from the patients with PCa without the involvement of any lymph nodes. The NA samples were removed from the study.