A dataset of infrared (ATR-FTIR) spectra for textile fibres of natural and mand-made origin
Description
This dataset contains attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectra of natural and man-made textile fibres assembled to support research in forensic, analytical, and environmental science. The dataset was developed under the hypothesis that textile fibres exhibit characteristic and reproducible infrared signatures that enable differentiation between fibre classes. By capturing a broad representation of commonly encountered textile materials, the dataset provides a reference framework for fibre identification, spectral comparison, and the development and evaluation of classification approaches. A total of 160 spectra were obtained from 137 verified textile samples sourced from industry reference collections and academic textile archives. Only pure (non-blended) fibres were included. Samples were provided as fibre tufts, yarns, or fabric swatches, and natural variation in manufacturing treatments was retained to reflect the diversity typically encountered in real-world materials. The dataset currently covers 26 fibre subtypes, spanning both natural and man-made categories. The data are organised into six components: raw instrument files, tabular matrices of raw transmission spectra, baseline-corrected spectra, averaged spectra grouped by fibre subtype, three alternative fully pre-processed feature matrices prepared using different spectral preprocessing pipelines, and a metadata file describing sample-level attributes. These datasets allow users to explore spectral variability within and between fibre categories, build custom reference libraries, benchmark spectral matching algorithms, examine discriminating spectral regions, and train or evaluate chemometric or machine-learning models using a consistent, well-structured foundation.
Files
Steps to reproduce
Fibre samples were obtained from verified sources, including the Forensic Fiber Reference Collection and the Arbidar Natural Fiber Collection (Microtrace LLC, Elgin, USA), as well as the internal textile collections of the UNUSUWUL group at the University of the Arts London (London, UK) and the Bio-Couture group at Northumbria University (Newcastle, UK). Only pure, non-blended fibres were included. Samples were supplied in various physical forms, such as tufts, yarns, and woven or knitted swatches, and no standardisation of surface finishing treatments was applied. Spectral measurements were collected using a PerkinElmer Frontier FT-IR spectrometer equipped with a single-reflection diamond ATR accessory. Each sample was placed on the ATR crystal and held under pressure using the built-in clamp. Spectra were recorded in transmission mode from 4000 to 550 cm⁻¹, with a 4 cm⁻¹ spectral resolution and four co-added scans per acquisition. A background spectrum was acquired before each measurement, and the crystal surface was cleaned with acetone between samples. Multiple spectra were acquired for each fibre, and one or more high-quality spectra were retained based on baseline stability, absence of saturation, and appropriate signal range. All spectra were recorded using PerkinElmer Spectrum software v10.4.00. To reproduce the tabular feature matrices, raw .sp files should be exported to transmittance format and aligned to a common wavenumber axis (4000–550 cm⁻¹ at 1 cm⁻¹ intervals). The raw matrix (folder 02) was generated by combining these spectra with metadata fields (Spectrum_ID, Source_ID, Replica, Origin, Type, Subtype). Baseline-corrected spectra (folder 03) were produced using asymmetric least-squares (ALS) correction, and averaged spectra (folder 04) were computed by grouping ALS-corrected spectra by fibre subtype. The fully pre-processed datasets (folder 05) were generated using three alternative preprocessing pipelines: (1) conversion from transmittance to absorbance, ALS baseline correction, and scatter correction via Standard Normal Variate (SNV); (2) conversion from transmittance to absorbance, ALS baseline correction, SNV correction, and Savitzky–Golay smoothing with an 15-point window and first derivative; and (3) conversion from transmittance to absorbance, ALS baseline correction, SNV correction, and Savitzky–Golay smoothing with an 15-point window and second derivative. All processing was implemented in Python using numpy, pandas, and sciPy. A Python script included in the dataset allows users to apply the same workflow and regenerate all processed datasets.