The landscape of R-loop binding proteins implicates therapeutic strategies in colorectal cancer.

Published: 14 June 2022| Version 1 | DOI: 10.17632/3gm3752rrw.1
Rong Tan


The original data and supplementary tables.


Steps to reproduce

Data availability 204 DNA/RNA Hybrids genes list were obtained from published studies (Wang et al., 2018) (Cristini et al., 2018). Source proteomic data, mRNA, mutation and CNA raw data of three colon cancer in this study can be accessed through the CPTAC data portal and from the Firehose website (,, version 20130523, Source proteomic data of Clear Cell Renal Cell Carcinoma, Head-and-neck squamous cell carcinoma, Lung adenocarcinoma, Hepatocellular Carcinoma can be obtained from CPTAC data portal. The gene fitness scores, proteomic data, mRNA expression and Copy number alteration of the cell lines are available from the project Score web portal: Cancer cell lines’ drug response from this study were download from the GDSC database. Differential protein analysis A paired Wilcoxon sighed-rank test was employed to detect the difference of protein expression between tumor and paired Para cancerous tissues. Correlations of protein to mRNA or copy number alteration (CNA) or DNA methylation in RLBPs A total of 204 RLBP genes in the CRC cell lines or tumor were calculated with pairwise Pearson’s correlation coefficients between protein abundance to mRNA or CNA or CpG promotor DNA methylation. Survival analysis Kaplan-Meier survival curve (log-rank test) was used to compare the overall survival time (OS) of CRC cancer patients with proteomic subtypes, or high and low expression of RLBPs. OS curve was calculated according to the median cutoff. NMF clustering for proteomic of CRC Non-negative matrix factorization (NMF) and the R-package (CancerSubtypes, Version: 1.14.0) were employed to identify CRC sample clusters. GSEA analysis Gene Set Enrichment Analysis was performed using the GSEA software ( Comparison of DDR gene expression between two clusters The expression of 276 DDR genes in the Cluster II relative to the Cluster I were calculated and log2(Fold change(C II/C I)) >0.5 or<-0.5, p<0.05 were considered statistically significant. Immune score and stromal score Estimation of stromal and immune cells in malignant tumor tissues was based on the Expression Data (ESTIMATE) algorithm ( and Xcell ( Comparison of immune cell abundance between two clusters The fraction of immune cell types in a mixed cell population within the leukocyte compartment were estimated using Xcell and CIBERSORT. 78 immunomodulators (IMs) genes list were obtained from published study (Thorsson et al., 2018). Cell line drug sensitivity data were download from Genomics of Drug Sensitivity in Cancer Project website (GDSC1 and GDSC2).


Xiangya Hospital Central South University


Descriptive Table, Statistical Table, Input-Output Table