EBV-associated gastric carcinoma: four-gene immunoscore

Published: 7 January 2026| Version 2 | DOI: 10.17632/tb673mfw47.2
Contributor:
Ruizhen Huo

Description

Title: EBV-associated gastric carcinoma: four-gene immunoscore — gene expression data, immunoscore coefficients, and clinical data for EBVaGC and EBVnGC samples. Overview This dataset contains the minimal, reproducible materials required to verify the analyses reported in the manuscript “Immune Hub Genes and a Proof-of-Concept Prognostic Signature in EBV-Associated Gastric Carcinoma (EBVaGC).” We provide (i) sample ID lists used to subset publicly available cohorts (TCGA-STAD, GEO: GSE51575, GSE62254), (ii) the four-gene risk-score coefficients, (iii) derived per-sample immunoscores and group labels for discovery/validation cohorts, and (iv) minimal R scripts to reproduce the key figures and statistics (Figure 4 IHC plot, Kaplan–Meier curves, time-dependent ROC, and C-index). Data source policy Raw transcriptomic data are not re-distributed here; they remain available from public repositories as cited (TCGA-STAD; GEO: GSE51575, GSE62254). The provided sample indices allow users to retrieve the identical subsets, or users may rely on the included derived immunoscore tables to reproduce all figures without re-downloading raw matrices. Contact (lead) Jianning Chen, MD, PhD — chjning@mail.sysu.edu.cn . Issues and suggestions are welcome via dataset comments.

Files

Steps to reproduce

DATA SOURCES - GSE51575 (n=52): Discovery cohort for WGCNA analysis - TCGA-STAD (n=25 EBVaGC): Model development and internal validation - GSE62254 (n=18 EBVaGC): External validation - In-house IHC cohort (n=40): Immunohistochemical validation ANALYSIS WORKFLOW (6 R scripts) 1. WGCNA Analysis (01_WGCNA.R) - Soft-thresholding power: 9 (R²=0.86) - Identified immune-related brown module 2. Model Construction (02_Immunoscore Model Construction.R) - MSigDB C7 immune gene filtering - Co-expression analysis with PTPRC/ITGB2 (|r|>0.30) - Univariate Cox screening (P<0.10) - LASSO-Cox regression (10-fold CV, lambda.min) - Two models: 2-gene (data-driven) and 4-gene (biologically-informed) 3. Cox Analysis (03_Cox Analysis.R) - Univariate and multivariable Cox regression - Forest plot generation 4. PH Assumption Test (04_Proportional Hazards Test.R) - Schoenfeld residuals test 5. Risk Grouping (05_Risk_Grouping_Comparison.R) - Cutpoint strategies: median, tertiles, quartiles, optimal 6. External Validation (06_External_Validation.R) - Time-dependent ROC (1-year, 3-year AUC) - C-index calculation IHC QUANTIFICATION - Markers: CD18, CD45, CD68 - 5 HPF (400×) per case per marker - QuPath software for cell density quantification - Raw CD18/CD45 images included in this repository - Raw CD68 images available at: https://pan.baidu.com/s/1rJKRxqiDPdPpbtqc7VwjWw?pwd=pdyb (Extraction code: pdyb) TRANSWELL MIGRATION ASSAY - THP-1 monocytes with tumor conditioned medium - CD18 blocking antibody treatment - 3 independent replicates SOFTWARE R (≥4.3.0) with packages: survival, survminer, glmnet, WGCNA, clusterProfiler, timeROC, rms REPRODUCIBILITY - Random seed: set.seed(2025) - 5×5 nested cross-validation - Bootstrap: B=1000 EXECUTION Run: Rscript scripts/01_WGCNA.R through 06_External_Validation.R sequentially. Outputs appear in results/ folder.

Institutions

  • Third Affiliated Hospital of Sun Yat-Sen University

Categories

Gastroenterology, Oncology, Pathology, Genomics, Systems Biology, Tumor Immunology, Prognostic Marker

Funders

Licence