Dataset Machine Learning Pogostemon cablin Budok Desease

Published: 24 March 2026| Version 1 | DOI: 10.17632/8pbh75tb4k.1
Contributor:
Elly Sufriadi

Description

Title: FT-NIR Spectral Dataset for Machine Learning-Based Classification of Synchytrium pogostemonis (Budok Disease) Infection in Pogostemon cablin Leaves. Description: This dataset contains Fourier Transform Near-Infrared (FT-NIR) spectral data of Pogostemon cablin (patchouli) leaves collected for the purpose of developing and evaluating machine learning models for the classification of budok disease caused by Synchytrium pogostemonis. The dataset supports the discrimination of three physiological conditions: healthy (control), infected (susceptible), and resistant (exposed but asymptomatic) plants. Spectral data were acquired in diffuse reflectance mode over the wavenumber range of 10,000–4,000 cm⁻¹ using an FT-NIR spectrometer. Each sample was measured in triplicate to ensure reproducibility, and the averaged spectra were used for further analysis. The dataset includes samples from three Indonesian patchouli varieties, namely Lhokseumawe, Tapak Tuan, and Sidikalang, enabling the assessment of model robustness across genetic variability. Preprocessing techniques such as Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), and Savitzky–Golay second derivative may be applied to reduce scattering effects and enhance spectral features. The dataset is suitable for supervised machine learning applications, including classification using algorithms such as Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), AdaBoost, Gradient Boosting (GB), and k-Nearest Neighbors (kNN). In addition to classification tasks, the dataset can be used for chemometric analysis, including Principal Component Analysis (PCA), feature selection, and interpretation of diagnostically relevant spectral regions associated with biochemical changes in plant tissues, such as variations in water content, carbohydrates, and terpenoid compounds. This dataset is intended to support reproducible research in the fields of analytical chemistry, spectroscopy, plant pathology, and precision agriculture. It provides a benchmark for the development of rapid, non-destructive diagnostic models for plant disease detection using spectral data and machine learning approaches. Keywords: FT-NIR spectroscopy; Patchouli; Pogostemon cablin; Budok disease; Synchytrium pogostemonis; Machine learning; Chemometrics; Spectral data; Plant disease detection; Precision agriculture

Files

Steps to reproduce

The detailed step-by-step procedure for reproducing the SNV preprocessing and machine learning workflow is provided in a separate supplementary file, which can be accessed via the link below: The detailed step-by-step procedure for reproducing the SNV preprocessing and machine learning workflow is provided in a separate supplementary file, which can be accessed via the link below:

Institutions

Categories

Machine Learning

Licence