A Curated Dataset for Drug Class Prediction and Repositioning
Description
This curated dataset offers a valuable resource for deep learning applications in drug discovery and repositioning domains. It contains 5,350 high-resolution images systematically categorized into pharmacological classes and molecular targets. The pharmacological classes encompass antifungals, antivirals, corticosteroids, diuretics, and non-steroidal anti-inflammatory drugs (NSAIDs), while the molecular targets emphasize Alzheimer's disease-related enzymes, including acetylcholinesterase, butyrylcholinesterase, and beta-secretase 1. The dataset was meticulously compiled using data from well-established databases, including DrugBank, ChEMBL, and DUD-E, ensuring diversity and quality in the compounds selected for training. Active compounds (true positives) were sourced from DrugBank and ChEMBL, while decoy compounds (true negatives) were generated using the DUD-E protocol. The decoy compounds are designed to match the physicochemical properties of active compounds while lacking binding affinity, creating a robust benchmark for machine learning evaluation. The balanced structure of the dataset, with equal representation of true positive and decoy compounds, enhances its suitability for binary and multi-class classification tasks. The collection of compounds is diverse and of high quality, thus supporting a wide range of deep learning tasks, including pharmacological class prediction, virtual screening, and molecular target identification. This ultimately advances computational approaches in drug discovery.
Files
Institutions
Categories
Funding
National Council for Scientific and Technological Development
CNPq-308161/2023-8
Fundação de Amparo à Pesquisa do Estado de Minas Gerais
APQ-03224-24
Fundação de Amparo à Pesquisa do Estado de Minas Gerais
APQ-04559-22
Fundação de Amparo à Pesquisa do Estado de Minas Gerais
APQ-02742-17