Data for Descriptor-Free QSAR: Effectiveness and Screening for Putative Inhibitors of FGFR1
Description
The effectiveness of descriptors-utilizing quantitative structure-activity relationship models in drug design remains limited by the quality of descriptors used in training, this then raises the question: can QSAR models be directly trained on compound SMILES? Long short-term memory (LSTM) algorithm has been employed to answer this question however, the direct application remains scarce. The effectiveness of a descriptor-free QSAR (LSTM-SM) in modeling the FGFR1 inhibitors dataset while comparing with two conventional QSAR using descriptors (126bits Morgan fingerprint and 2D descriptors respectively) was investigated in this study. The validated descriptor-free QSAR model was thereafter used to screen for active FGFR1 inhibitors in the ChemDiv database and subjected to molecular docking, induced-fit docking, and QM-MM optimization to filter for compounds with high binding affinity and suggest the putative mechanism of inhibition and specificity. The LSTM-SM model, when compared with the conventional QSAR models, performed better having accuracy, specificity, and sensitivity of 0.92, model loss of 0.025, and AUC of 0.95. Fifteen thousand compounds were predicted as actives from the ChemDiv database and four compounds were finally selected. Of the four, three showed putatively effective binding interactions with key active site residues and were also effective against acquired resistance due to gateway residue mutations. The advent of self-feature extracting machine learning algorithms, therefore, has provided the possibility of better predictive model quality that is not necessarily limited by compound descriptors thus we apply this approach in discovering putatively active FGFR1 inhibitors and elucidated the putative mechanism of inhibition and specificity for the obtained compounds.