Protein and Ligand Dataset for Drug Repositioning in Childhood Acute Lymphoblastic Leukemia (ALL)

Published: 21 April 2025| Version 1 | DOI: 10.17632/r5ftnf4j9f.1
Contributors:
,

Description

This dataset includes two main components (proteins and ligands), which can be used in computational research focused on drug repositioning in Childhood Acute Lymphoblastic Leukemia (ALL): 1- Protein Sequences (proteins.txt): This dataset file contains amino acid sequences of selected proteins used in a study aiming to identify novel therapeutic candidates by drug repositioning for Childhood Acute Lymphoblastic Leukemia (ALL). The sequences were extracted from the UniProt database and are proteins which are known or predicted to be associated with ALL pathogenesis, treatment procedures, or immunological relevance. The file is structured in a JSON-like format with UniProt IDs as keys and amino acid sequences as values. Each entry corresponds to one protein. Additional Metadata: - Data Type: Amino acid sequence data (FASTA-like JSON format) - Unique Proteins: 8479 - Average Sequence Length: 529.81 - Maximum Sequence Length: 14507 2. Ligand Data (ligands.txt): A collection of drug-like small molecules represented in SMILES or similar formats. The ligands in this file were selected by considering their therapeutic potential and relevance to ALL-related targets. The data was sourced from databases such as ChEMBL and DrugBank. Also, some data related to FDA approved drugs were added manually. Additional Metadata: - Data Type: SMILES strings in JSON-like format (key-value pairs). - Number of ligands: ~220.000 The combined dataset supports research in bioinformatics, drug discovery, and leukemia-specific therapeutic targeting. The dataset is designed to aid computational biology, bioinformatics and artificial intelligence research, especially for researchers in the field of leukemia biology, drug-target interaction modeling, and systems pharmacology.

Files

Institutions

Suleyman Demirel Universitesi

Categories

Pharmacology, Artificial Intelligence, Bioinformatics, Protein, Ligand, Deep Learning, Drug Repositioning, Childhood Acute Lymphocytic Leukemia

Funding

Scientific and Technological Research Council of Turkey

123E383

Licence