Augmented dataset of brazilian road’s animals using GANs for object detection tasks
Description
The BRA-Dataset is an expanded dataset of Brazilian wildlife, developed for object detection tasks, combining real images with synthetic samples generated by Generative Adversarial Networks (GANs). It includes five medium- and large-sized mammal species frequently involved in roadkill incidents on Brazilian highways: lowland tapir (Tapirus terrestris), jaguarundi (Herpailurus yagouaroundi), maned wolf (Chrysocyon brachyurus), puma (Puma concolor), and giant anteater (Myrmecophaga tridactyla). The primary goal is to provide a comprehensive and standardized resource for biodiversity conservation research, wildlife monitoring technologies, and computer vision applications, with an emphasis on automated wildlife detection. The original dataset by Ferrante et al. (2022) was built from images of wildlife captured through camera traps, field cameras, and structured internet searches, followed by manual curation and bounding box annotation. In this work, the dataset was expanded to approximately 9,238 images, divided into three main groups: 1. Real images — original photographs collected from the aforementioned sources. Total: 1,823. 2. Images augmented by classical techniques — generated from real images using transformations such as rotations (RT), horizontal flips (HF), vertical flips (VF), and horizontal (HS) and vertical shifts (VS). Total: 7,300. 3. Synthetic images generated by GANs — produced with WGAN-GP models trained individually for each species, using pre-processed image subsets. All generated samples underwent qualitative assessment to ensure morphological consistency, proper framing, and visual fidelity before inclusion. Total: 115. The directory structure is organized into images/ and labels/, each subdivided into train/ and val/, following an 80% training and 20% validation split. Images are provided in .jpg format and annotations in .txt following the YOLO standard (class_id x_center y_center width height, with normalized coordinates). Furthermore, the file naming convention is designed to clearly indicate the species and the type of data augmentation applied. The dataset is compatible with various object detection architectures and was evaluated using YOLOv5, YOLOv8, and YOLOv11 in n, s, and m variants, aiming to assess the impact of dataset expansion in scenarios with different computational capabilities and performance requirements. By combining real data, classical augmentations, and high-quality synthetic samples, the BRA-Dataset provides a valuable resource for wildlife detection, environmental monitoring, and conservation research, especially in contexts where image availability for rare or threatened species is limited.
Files
Institutions
Categories
Funding
National Council for Scientific and Technological Development
147019/2024-9