Provenance Modelling of Fossil Dinosaur Bones Using Geochemistry and Machine Learning: Source Data

Published: 3 July 2025| Version 3 | DOI: 10.17632/25r6txd45n.3
Contributors:
Michał Surowski,
,
, David Chew, Foteini Drakou

Description

The data presented here support the research paper “Geochemistry-Based Provenance Modelling in Fossil Bones: A Machine Learning Study from the Upper Cretaceous Gobi Localities”, intended for a submission to "Cretaceous Research". The dataset contains trace elements concentrations from fossil dinosaur bones from the Upper Cretaceous Nemegt and Djadokhta formations. For the analytical purposes, the dataset was divided into two subsets: the first one consisting of long bones (tibiae, femora, radii and humeri) and the other including trabecular bones (ribs and vertebrae) and metatarsals. Locality labels were used to train and evaluate several machine learning classifiers (logistic regression, random forest, AdaBoost, XGBoost) to assess the potential of bone geochemistry for provenance prediction. Feature selection was conducted on the best-performing models to identify the elements contributing the most to the model performance. These results were compared with those obtained using Linear Discriminant Analysis. The data are provided in CSV format in the “Data” folder. The folder “Figures” contains the figures used in the manuscript. The folder “Supplementary files” contains interactive HTML plot ("Element profiles.html") showing the all the concentration profiles across each analysed sample, including the ones measured along several profiles, together with the concentration profiles presented in a PDF file ("All profiles.pdf").

Files

Institutions

Uniwersytet Warszawski Wydzialu Geologii

Categories

Natural Sciences, Fossil Geochemistry, Dinosaur, Fossil Bones

Funding

National Science Center

2017/27/N/ST10/00984

Licence