Predictive Modeling of Empty Puparia Age Using Cuticular Hydrocarbon Concentrations: A Machine Learning Approach

Published: 24 May 2024| Version 1 | DOI: 10.17632/m68y9brvfw.1
Swaima Sharif


This dataset comprises concentration measurements of four different cuticular hydrocarbons (Pentacosane - C25, Heptacosane - C27, Octacosane - C28, and Nonacosane - C29) extracted from empty puparia of Calliphora vicina. These puparia were stored in both paper towel and soil pupation mediums under controlled laboratory conditions. The measurements were taken at various ages of the empty puparia, with age recorded in days and concentrations measured in nanograms per microliter (ng/µL). Each row in the dataset represents a specific observation, detailing the age of the empty puparia along with the concentrations of the four hydrocarbons. These concentrations are expressed in units of ng/µL, indicating the quantity of each hydrocarbon present in a microliter of the extraction solution. For analysis, two machine learning models, Support Vector Machine (SVM) and eXtreme Gradient Boosting (XGBoost), were employed to accommodate the unique characteristics of the dataset. These models utilized the concentrations of n-C25, n-C27, n-C28, and n-C29 hydrocarbons to predict the age of the empty puparia. The dataset is supplemented with three files: one Excel sheet containing concentration measurements of empty puparia investigated over 180 days in laboratory conditions, and two Word files containing R script codes for implementing the SVM and XGBoost machine learning algorithms to estimate the age of the empty puparia.


Steps to reproduce

Sample preparation Each sample was prepared using two empty puparial cases (n=5 from one duplicate, so n=15 from three replicates) immersed in Hexane (500 μL) in a 10 ml headspace vial for 10 minutes in an ultrasonicator. After collecting the hexane extract in a sterile autosampler vial, it was allowed to dry out in nitrogen air. All samples were stored at room temperature until required for analysis. For GC-MS analysis, the dried extract was reconstituted in 100 μL hexane and vortexed for 30 seconds. A hexane blank run followed each sample to clear the column. Chemicals and reagents Reference standards n-Pentacosane, n-Hexacosane, n-Heptacosane, n-Octacosane, and n-Nonacosane were purchased from Sigma-Aldrich (Taufkirchen, Germany). n-Hexane was purchased from Merck (Darmstadt, Germany). Instrument and settings An Agilent (Waldbronn, Germany) GC-MS system (7693 GC, 7890 B MSD) equipped with an Agilent VG-1 ms capillary column (30 m × 250 μm I.D. × 0.25 μm film thickness) was used for the chemical analysis. Following G.C. conditions were implemented, with helium serving as the carrier gas at a flow rate of 1.2 mL/min: Temperature program: 100°C for 2 minutes, increasing by 25°C/min to 200°C, then by 3°C/min to 260°C, then by 20°C/min to 320°C, and holding for 2 minutes; injection temperature: 250°C; splitless mode. MS: 5977 B MSD Agilent Technologies, quadrupole, EI: 70 eV → positive ion mode. SIM and scan modes are simultaneously used. Scan: m/z 45 – 600, start after 4 min. Quantification of Compounds To quantify the concentrations of n-C25-n-C29 compounds, a single-ion monitoring (SIM) approach was employed. The target ions used for analysis included methadone-d9 as the internal standard (target ion: 303), n-C25 (target ion: 352), n-C26 (target ion: 366), n-C27 (target ion: 380), n-C28 (target ion: 394), and n-C29 (target ion: 408). A calibration curve was established using the Agilent Chemstation data analysis software. To establish a reliable linear relationship, a range of concentrations spanning from 0.4 ng/µl to 50 ng/µl was utilized (specifically, concentrations of 0.4 ng/µl, 0.6 ng/µl, 2 ng/µl, 4 ng/µl, 5 ng/µl, 10 ng/µl, 15 ng/µl, 20 ng/µl, 25 ng/µl, and 50 ng/ µl were employed). Compounds were identified by matching the known retention times of standard references and were also confirmed by searching through the NIST17 database library and mass spectra using MSD ChemStation F.01.03.2357.


Aligarh Muslim University, Goethe-Universitat Frankfurt am Main


Forensic Science, Forensic Entomology, Applied Machine Learning