Dataset for an Interpretable Classification Model for Systemic Lupus Erythematosus
The dataset used in this study is from the Sultan Qaboos University Hospital’s Rheumatology clinic collected by (Al Sawafi S., 2021). It includes 214 Omani patient records from 2006 to 2019 that have met the entry criteria set by EULAR. Entry criteria require a positive Antinuclear Antibodies test (ANA test) after which the rest of the classification additive criteria are applied. Of the 219 patients 138 are diagnosed with SLE, this was also confirmed by a physician on case-by-case bases. The remaining 81 patients have other control diseases. The dataset includes demographic data, clinical data, and laboratory data. We developed an early detection model for SLE with an interpretation functionality that establishes trust in prediction. SHAP interpretation tool was implemented to explain and justify individual predictions and thereby eliminate any risk of misclassification. Additionally, a minimum set of 13 early predictors achieved the highest scores of 0.95 AUC and 0.89 F-1 metric. With these scores, our model can reasonably predict the presence or absence of SLE. The features comprise demographic and clinical symptoms available at physicians at early stages. It was found that four clinical features had the highest influence on the prediction in addition to the patient’s age. Alopecia, renal, ACL, and hemolytic anemia are all indicators of lupus activity at varying rates, combined with the patient’s age and age-onset the model was able to establish a profile of the disease relative to Omanis. It was found that the four critical features are more frequent in other Arab cohorts and the model has the potential to be extended to include those ethnicities. With such scores, our model can predict with reasonable certainty the presence and absence of SLE. This can alert physicians to investigate further with the help of immunological tests such as antinuclear antibodies test and Anti-dsDNA test. Future work includes training the model to categorize non-SLE patients into “possible-SLE” and “highly unlikely” that can help monitor patients suspected of developing SLE in the future. The symptom frequency also dictates that some features carry more weight in classification than other features, thus Class-Weight classification can be applied against normal classifiers to hopefully obtain better performance.