Mid-infrared spectral dataset and machine learning tools for defect detection in green and roasted coffee
Description
This repository provides a comprehensive mid-infrared (FTIR) spectral dataset of defect-free and defective coffee beans at both green and roasted processing stages, together with spectral preprocessing workflows and machine learning tools for defect classification. The dataset includes spectra acquired in the wavenumber range of 4000–650 cm⁻¹ using ATR-FTIR spectroscopy, representing Control samples (defect-free) and five industry-relevant defect categories: bitten, discolored, insect-damaged (drill bit), sour (vinegar), and black defects. To support robust spectral analysis and facilitate reproducible modeling, the repository also includes spectra preprocessed using commonly applied chemometric techniques, including baseline correction, Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), and Savitzky–Golay first and second derivative transformations. These preprocessing methods allow users to evaluate the influence of spectral correction strategies on classification performance and feature extraction. In addition to the spectral datasets, this repository provides R-based machine learning workflows for the simultaneous classification of defective and non-defective coffee samples in both green and roasted states. The computational tools include Support Vector Machine (SVM) and Random Forest (RF) algorithms, together with scripts for data preprocessing, model calibration, validation, and performance evaluation. These tools enable reproducible development and benchmarking of classification models for spectroscopy-based food quality assessment. This dataset may be valuable for researchers in food science, spectroscopy, chemometrics, and machine learning, as well as for coffee industry stakeholders interested in developing rapid, non-destructive quality control systems. Furthermore, the availability of both spectral data and computational tools facilitates reuse in applications such as food authentication, defect detection, and the development of data-driven quality monitoring strategies in agri-food systems.
Files
Institutions
- Universidad SurcolombianaHuila Department, Neiva