Data used in Predicting fertility from sperm motility landscapes

Published: 2 September 2022| Version 4 | DOI: 10.17632/jd38jhxpg6.4


Data for permit reproducibility of the results in Predicting fertility from sperm motility landscapes (Fernández-López et al., 2022). Check the related github repository ( for instructions on how to reproduce the results from the paper. For more details, check out the methods section of the paper. For any issues regarding data or code, please publish an issue in the corresponding github repository. Raw data files (in RAW_DATA folder): fertility_data.csv, sperm_data.csv and capacitated_dataset.csv. fertility_data.csv contains the information about insemination outcomes. Namely (from left to right columns): boar unique ID, boar's age, sow unique ID, sow parity (number of times said sow has completed a cycle of pregnancy, from insemination to delivery), sow's age, insemination date (in format DD/MM/YYYY), number of born pigglets in said insemination, number of dead born pigglets, number of alive born pigglets, seminal dose extraction date (in format DD/MM/YYYY), whether the insemination succeded (pregnancy, 1) or not (0) and the ejaculate unique ID. sperm_data.csv contains the motility features of the seminal doses described in the fertility dataset. The data contains the unique identifier for the boars and their ejaculates, as well as several motility features of the sperm. In particular (from left to right columns): area of the head of the spermatozoon, curvilinear velocity, straight-line velocity, average path velocity, linearity, straightness, wobble, amplitude of lateral head displacement, beat-cross frequency, boar unique ID and ejaculate unique ID. The last 3 columns are the cluster labels, the merged cluster labels and the effect on motility of said clusters. capacitated_dataset.csv contains the motility features of the combined fresh and capacitated sperm, that was used to generate the Figure 5 in "Predicting fertility from sperm motility landscapes" (Fernández-López et al., 2022). This dataset only contains the main variables (VCL, VSL, ALH and BCF), as well as a label indicating whether the spermatozoon was incubated in capacitating conditions (Capacitated) or not (Fresh). The x and y columns are the corresponding coordinates in the landscape. The other data files are related to the figures of the paper. Check the github repository for details (


Steps to reproduce

In order to reproduce the results in Fernandez-Lopez, P. et al (2022), follow the instructions in and use the code provided there.


Consejo Superior de Investigaciones Cientificas


Cell Biology, Animal Fertility, Sperm Evaluation