BRAGAN: a GAN-augmented dataset of Brazilian roadkill animals for object detection

Published: 20 August 2025| Version 2 | DOI: 10.17632/ck88dwffgd.2
Contributors:
Henrique Souza de Abreu Martins,
,

Description

BRAGAN is a new dataset of Brazilian wildlife developed for object detection tasks, combining real images with synthetic samples generated by Generative Adversarial Networks (GANs). It focuses on five medium and large-sized mammal species frequently involved in roadkill incidents on Brazilian highways: lowland tapir (Tapirus terrestris), jaguarundi (Herpailurus yagouaroundi), maned wolf (Chrysocyon brachyurus), puma (Puma concolor), and giant anteater (Myrmecophaga tridactyla). Its primary goal is to provide a standardized and expanded resource for biodiversity conservation research, wildlife monitoring technologies, and computer vision applications, with an emphasis on automated wildlife detection. The dataset builds upon the original BRA-Dataset by Ferrante et al. (2022), which was constructed from structured internet searches and manually curated with bounding box annotations. However, while the BRA-Dataset faced limitations in size and variability, BRAGAN introduces a new stage of dataset expansion through GAN-based synthetic image generation, substantially improving both the quantity and diversity of samples. In its final version, BRAGAN comprises approximately 9,238 images, divided into three main groups: Real images — original photographs from the BRA-Dataset. Total: 1,823. Classically augmented images — transformations applied to real samples, including rotations (RT), horizontal flips (HF), vertical flips (VF), and horizontal (HS) and vertical shifts (VS). Total: 7,300. GAN-generated images — synthetic samples created using WGAN-GP models trained separately for each species on preprocessed subsets of the original data. All generated images underwent visual inspection to ensure morphological fidelity and proper framing before inclusion. Total: 115. The dataset follows an organized directory structure with images/ and labels/ folders, each divided into train/ and val/ subsets, following an 80–20 split. Images are provided in .jpg format, while annotations follow the YOLO standard in .txt files (class_id x_center y_center width height, with normalized coordinates). The file naming convention explicitly encodes the species and the augmentation type for reproducibility. Designed to be compatible with multiple object detection architectures, BRAGAN has been evaluated on YOLOv5, YOLOv8, and YOLOv11 (variants n, s, and m), enabling the assessment of dataset expansion across different computational settings and performance requirements. By combining real data, classical augmentations, and high-quality synthetic samples, the BRAGAN provides a valuable resource for wildlife detection, environmental monitoring, and conservation research, especially in contexts where image availability for rare or threatened species is limited.

Files

Institutions

Universidade de Sao Paulo

Categories

Animal, Object Detection, Wildlife Conservation, Deep Learning, Generative Adversarial Network, Data Augmentation

Funding

National Council for Scientific and Technological Development

147019/2024-9

Licence