Expanding the Landscape of Drug Targets for 112 Chronic Diseases Using a Machine Learning-Assisted Genetic Priority Score

Published: 29 April 2024| Version 3 | DOI: 10.17632/nxbbhbwnm3.3
Contributor:
Robert Chen

Description

Identifying genetic drivers of chronic diseases is crucial for drug discovery. We developed a Machine Learning-assisted Genetic Priority Score (ML-GPS) that incorporates genetic associations with predicted disease phenotypes to enhance target discovery. Dependencies: - Python 3.11.6 - scikit-learn 1.4.1 - LightGBM 4.0.0 - scipy 1.12.0 - statsmodels 0.14.1 Jupyter notebooks: 1. Phecode diagnosis prediction models - code to train phecode diagnosis prediction models among UK Biobank participants. 2. ML-GPS models - code to train ML-GPS in Open Targets and externally test it in SIDER. Cleaned Open Targets and SIDER datasets are in the "Datasets" folder. Models take approximately 10 minutes to train on 24 threads.

Files

Institutions

Icahn School of Medicine at Mount Sinai

Categories

Machine Learning

Licence