Training and test set data used for the determination of total carbon in biosolids using MID-infrared spectroscopy

Published: 06-04-2020| Version 1 | DOI: 10.17632/kbd2g7j7yd.1
Nihal Albuquerque,
Barry Meehan,
Jeff Hughes,
Aravind Surapaneni


The data were collected as described in ( and The raw MID-IR (MID-Infra-red) data ( was used to determine total C in biosolids using two approaches - full spectral and selected wavelengths methods. For the full spectral method, the raw data was mean-centred and PCA analysis was carried out to remove abnormal samples. After the removal of abnormal samples, the full spectral data (located in the full spectral folder) was divided into two data set, the training set (approximately 2/3) and test set (approximately 2/3). The full spectra training set and test set data were saved in files named ‘Full spectra training set.xlsx’ and ‘Full spectra test set.xlsx’, respectively. For the selected wavelength method, wavenumbers that had a correlation coefficient > 0.5 with total C were selected. The remainder of the spectral data was deleted. The data was then analysed by PCA to remove any abnormal samples and divided into training and test set, similar to the full spectral method. The data containing both total C and MID IR data was stored in the selected wavelength folder and given the filenames ‘Selected wavelength training set.xlsx’ and ‘Selected wavelengths test set.xlsx’. A full description of how the data was used is provided in (