Code for Rapid and non-invasive early detection of lung cancer by integration of machine learning and salivary metabolic fingerprints using MS LOC platform

Published: 22 November 2024| Version 2 | DOI: 10.17632/5thzsdjc5f.2
Contributor:
Sheng Cao

Description

Most lung cancer (LC) patients are diagnosed at the advanced stages due to the lack of effective screening methods. Therefore, a non-invasive method for LC screening and early detection in large-scale clinical use is necessary. Herein, a total of 1043 saliva samples were collected from 334 LC patients and 709 non-LC volunteers from six hospitals and their metabolomics data were obtained using mass spectrometry Lab-on-a-Chip (MS LOC). This approach displays high speed and high-throughput capability (96 samples per batch) for stable salivary metabolic fingerprints acquisition. Utilizing machine learning-based feature screening, we identified 35 metabolic features for LC, indicating that metabolism was disturbed in saliva from LC patients. Subsequently, a classification model named SalivaMLD was developed using an ensemble voting strategy based on multiple machine learning algorithms. By combining the predictions from various models, the voting mechanism enhanced the model's classification accuracy and robustness. In the validation set, SalivaMLD demonstrated strong diagnostic performance, achieving an area under the curve (AUC) of 0.850, a sensitivity of 83.33%, and a specificity of 74.39%. In the test set, this model showed comparable effectiveness with AUC, sensitivity, and specificity of 0.849, 81.69%, and 74.23%, respectively, outperforming conventional tumor markers, such as carcinoembryonic antigen (CEA) and carbohydrate antigen 125 (CA125). Notably, SalivaMLD distinguished early-stage LC with an accuracy of 77.42%-81.97% and effectively differentiated LC with different pathology in both the validation and test sets. Hence, this method for screening LC by integration of machine learning and MS LOC-based salivary metabolic fingerprints may be widely applied in clinical practice for rapid and non-invasive detection.

Files

Categories

Bioinformatics, Metabolomics, Salivary Research

Licence