Gut Microbiome-Associated SNPs and Their Impact on the Immune Microenvironment in Head and Neck Squamous Cell Carcinoma: A Mendelian Randomization Study

Published: 3 April 2024| Version 1 | DOI: 10.17632/sj85ws8zsn.1
gengming Cai


Various studies and trials have indicated a potential link between cancer and the microbiome of the digestive system. Nevertheless, there is insufficient research on the connection between the gut microbiome and Head and Neck Squamous Cell Carcinoma (HNSCC). Genetic differences in the intestinal microbiome were obtained from the MiBioGen repository. We recognized gut microbiota across various taxonomic levels including phylum, class, order, family, and genus. UK Biobank provided the HNSCC data, while the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) database (GSE181919) supplied the HNSCC-related data for single-cell correlation analysis. Following this, we performed Mendelian Randomization (MR), analyzed immune cell infiltration, conducted Gene Set Variation Analysis (GSVA) and Gene Set Enrichment Analysis (GSEA), examined regulatory networks of important genes, conducted single-cell analysis, and performed statistical analysis. Seven ideal causal connections were discovered between the gut microbiome and HNSCC, with their validity confirmed by analyzing heterogeneity and genetic diversity. Examination showed that the Single-nucleotide polymorphism(SNP) genes have a strong connection to levels of immune cell infiltration and are important in shaping the immune microenvironment (IME) . Additional research revealed distinct signaling pathways and transcriptional regulatory networks that were enhanced by the seven SNP genes. Through the use of MR, we were able to establish a direct connection between the gut microbiome and HNSCC. This may contribute to providing new insights into the mechanisms and clinical research of gut microbiota-mediated HNSCC.


Steps to reproduce

Genetic variations in the gut microbiota come from the MiBioGen database. The examination concentrates on the V4, V3-V4, and V1-V2 variable sections of the 16S rRNA genetic material for studying microbiome composition, employing direct taxonomic categorization. The research categorizes organisms based on their phylum, class, order, family, and genus, discovering a combined total of 9 phyla, 16 classes, 20 orders, 32 families, and 119 genera that have an average abundance exceeding 1%. Data on results from Genome-wide Association Studies (GWAS) related to outcomes were collected from the UK Biobank's shared data. We downloaded processed mRNA expression data for HNSCC, including normal samples (n = 44) and tumor samples (n=522). Furthermore, information regarding HNSCC was obtained from the publicly available Gene Expression Omnibus (GEO) database (GSE181919) to conduct single-cell correlation analysis, which included a total of 29 samples (20 HNSCC samples and 9 control samples). We utilized the MR Base database. Potential Instrumental Variables (IVs) were selected as SNPs linked to each subclass with a genome-wide significance threshold of P < 1.0×10-5. We calculated LD (Linkage Disequilibrium) among SNPs, and for SNPs with R2 < 0.001 (using a clumping window size of 10,000 kb), we utilized four statistical approaches (Inverse variance weighted, MR Egger, Weighted median, Weighted mode) to evaluate the credibility of causal connections and derive a comprehensive evaluation of the influence of the gut microbiota on HNSCC. The identified causal connections were additionally confirmed using heterogeneity tests like Cochran's IVW Q-test and gene diversity tests. Analysis using the CIBERSORT algorithm. A P-value < 0.05 was considered statistically significant. Gene collections from the Molecular Signatures Database (version 7.0). The number of possible arrangements was established at 1000, with the type of arrangement being phenotype. The Cistrome DB was utilized to investigate the regulatory relationships of transcription factors and key genes. The genomic data was configured for hg38, with the transcription start site set at 10kb, and then visualized using Cytoscape. Initially, the Seurat package was used to load expression profiles and filter out low-expression genes (nFeature_RNA > 500 & nFeature_RNA < 6000 & percent. mt < 5). Afterwards, the information was subjected to standardization, normalization, and PCA examination. The optimal number of Principal Components (PCs) was observed through an ElbowPlot (18 PCs selected). TSNE analysis was conducted to visualize the spatial relationships between each cluster. Clusters were annotated using the cell dex software, linking them to cell types that are important in disease development. Statistical analyses were performed using R language (version 4.3), with all tests being two-sided and p < 0.05 indicating statistical significance.


Polymorphism, Gut Microbiome, Head and Neck Cancer, Mendelian Randomization, Tumor Immune Microenvironment