Provenance Modelling of Fossil Dinosaur Bones Using Geochemistry and Machine Learning: Source Data
Description
The data presented here support the research paper “Provenance Modelling of Fossil Dinosaur Bones Using Geochemistry and Machine Learning”, intended for a submission to "Paleoworld". The dataset contains trace elements concentrations from fossil dinosaur bones from the Upper Cretaceous Nemegt and Djadokhta formations. For the analytical purposes, the dataset was divided into two subsets: the first one consisting of long bones (tibiae, femora, radii and humeri) and the other including trabecular bones (ribs and vertebrae) and metatarsals. Locality labels were used to train and evaluate several machine learning classifiers (logistic regression, random forest, AdaBoost, XGBoost) to assess the potential of bone geochemistry for provenance prediction. Feature selection was conducted on the best-performing models to identify the elements contributing the most to the model performance. These results were compared with those obtained using Linear Discriminant Analysis. The data are provided in CSV format in the “Data” folder. The folder “Plots and figures” contains the figures used in the manuscript, including the plots. The folder “Supplementary files” contains additional files. These files are: - interactive HTML plot ("Element profiles.html") showing the all the concentration profiles across each analysed sample, including the ones measured along several profiles - concentration profiles presented in a PDF file ("All profiles.pdf") - XLSX file with the statistical summary of the data ("Data description WK.xlsx") - LDA scalings in CSV file ("LDA scalings.csv") - The tables comparing predictions and performance of the algorithms using test part of each subset ("Predictions and accuracy.xlsx") Besides that, the Jupyter Notebook with data analysis is also provided ("Bones from Gobi - loc prediction.ipynb").