Expanding the Landscape of Drug Targets for 112 Chronic Diseases Using a Machine Learning-Assisted Genetic Priority Score
Published: 29 April 2024| Version 3 | DOI: 10.17632/nxbbhbwnm3.3
Contributor:
Robert ChenDescription
Identifying genetic drivers of chronic diseases is crucial for drug discovery. We developed a Machine Learning-assisted Genetic Priority Score (ML-GPS) that incorporates genetic associations with predicted disease phenotypes to enhance target discovery. Dependencies: - Python 3.11.6 - scikit-learn 1.4.1 - LightGBM 4.0.0 - scipy 1.12.0 - statsmodels 0.14.1 Jupyter notebooks: 1. Phecode diagnosis prediction models - code to train phecode diagnosis prediction models among UK Biobank participants. 2. ML-GPS models - code to train ML-GPS in Open Targets and externally test it in SIDER. Cleaned Open Targets and SIDER datasets are in the "Datasets" folder. Models take approximately 10 minutes to train on 24 threads.
Files
Institutions
Icahn School of Medicine at Mount Sinai
Categories
Machine Learning