Protein and Ligand Dataset for Drug Repositioning in Childhood Acute Lymphoblastic Leukemia (ALL)
Description
This dataset includes two main components (proteins and ligands), which can be used in computational research focused on drug repositioning in Childhood Acute Lymphoblastic Leukemia (ALL): 1- Protein Sequences (proteins.txt): This dataset file contains amino acid sequences of selected proteins used in a study aiming to identify novel therapeutic candidates by drug repositioning for Childhood Acute Lymphoblastic Leukemia (ALL). The sequences were extracted from the UniProt database and are proteins which are known or predicted to be associated with ALL pathogenesis, treatment procedures, or immunological relevance. The file is structured in a JSON-like format with UniProt IDs as keys and amino acid sequences as values. Each entry corresponds to one protein. Additional Metadata: - Data Type: Amino acid sequence data (FASTA-like JSON format) - Unique Proteins: 8479 - Average Sequence Length: 529.81 - Maximum Sequence Length: 14507 2. Ligand Data (ligands.txt): A collection of drug-like small molecules represented in SMILES or similar formats. The ligands in this file were selected by considering their therapeutic potential and relevance to ALL-related targets. The data was sourced from databases such as ChEMBL and DrugBank. Also, some data related to FDA approved drugs were added manually. Additional Metadata: - Data Type: SMILES strings in JSON-like format (key-value pairs). - Number of ligands: ~220.000 The combined dataset supports research in bioinformatics, drug discovery, and leukemia-specific therapeutic targeting. The dataset is designed to aid computational biology, bioinformatics and artificial intelligence research, especially for researchers in the field of leukemia biology, drug-target interaction modeling, and systems pharmacology.
Files
Institutions
Categories
Funding
Scientific and Technological Research Council of Turkey
123E383