Sensitivity of genes, molecular pathways and disease related categories to chemical exposures
The goal of this project is to identify molecular mechanisms sensitive to chemical exposures in an unbiased way. Results of this project are published on preprints.org (doi: 10.20944/preprints202006.0261.v1). The data-files described below represent major steps of our analysis: 1. Annotated chemical-gene interactions.xlsx The data on chemical-gene interactions obtained from high-throughput toxicological genomic experiments with human, mouse, or rat cells and tissues was extracted from Comparative Toxicogenomic Database (CTD, http://ctdbase.org/) on 08.24.2018. Genes not present in genomes of all three species were filtered out. Chemical compounds were annotated for major uses with information from Wikipedia, PubChem, and PubMed. Based on textual annotation every compound was assigned one to three annotation terms out of the following list: pharmaceutical, recreational drug, research, warfare, endobiotic, agricultural, cosmetics, environment, food components, industrial, and pollutant. All contributors annotated an equal numbers of chemicals, and AS checked every annotation to insure consistency of approaches. The resulting dataset includes 591,084 individual chemical-gene interactions. 2. Number of chemical-gene interactions per gene.xlsx The dataset created at the previous step was used to determine number of chemical-gene annotations for every gene, including total number as well as number of activating and suppressive chemical-gene annotations. We hypothesize, that number of chemical gene interactions can be used as a measure of the gene sensitivity to chemical exposures. 3. Enrichment of molecular pathways with genes sensitive to chemical exposures.xlsx The list of genes with the total number of chemical-gene interactions for every gene was used as an input for the Gene-Set Enrichment Analysis (GSEA, https://www.gsea-msigdb.org/gsea/index.jsp) against Hallmark, KEGG, and Reactome datasets, to identify molecular pathways highly enriched with genes sensitive to chemical exposures. We suggest, that normalized enrichment score (NES) for every enriched pathway is a measure of the pathway's sensitivity to chemical exposures. 4. Diseases vs. chemically sensitive KEGG pathways matrix.xlsx and 5. Diseases vs. chemically sensitive Reactome pathways matrix.xlsx To identify disease categories that are sensitive to chemical exposures, the lists of significantly enriched KEGG and Reactome pathways (false discovery rate (FDR) q > 0.01 and normalized enrichment score (NES) > 1.9) were submitted to the CTD to run pathway-disease association analysis. This analysis resulted in matrices of shared gene numbers between chemically sensitive pathways and disease states. Two numeric values indicate sensitivity of disease states to chemical exposures: the number of inferred pathways associated with the disease state and the sum of genes from every pathway overlapping with the disease (number of inference genes).
Steps to reproduce
The data can be reproduced following the steps outlined in our data description