A Multi-Class UAV Military Object Detection Dataset: Tank, Drone, Soldier, and People Images from Aerial Perspectives with Synthetic Data Augmentation
Description
This dataset is a custom multi-class image dataset developed for deep learning-based object detection in UAV military surveillance and reconnaissance applications. It consists of 7,985 annotated images containing 14,018 object instances across four operationally relevant classes: tank, drone, people, and soldier. The dataset includes 3,000 tank images with 4,990 instances, 1,359 drone images with 1,296 instances, 2,644 people images with 4,492 instances, and 982 soldier images with 3,240 instances. Images were collected from publicly available sources, mainly Roboflow and Kaggle, with emphasis on aerial and bird’s-eye-view perspectives. The dataset covers diverse real-world conditions, including varying lighting levels, weather, occlusion, altitude variations, and terrain types. Due to the limited availability of open-source soldier images captured by operational UAV platforms, the soldier class was enhanced with photorealistic synthetic images generated in the GTA5 simulation environment using Director Mode. This allowed for control and lighting. All images were manually annotated using the Computer Vision Annotation Tool (CVAT) with axis-aligned bounding boxes and then converted into YOLO format. The preprocessing pipeline included duplicate removal, resizing, normalization, and data augmentation. Eight augmentation techniques were applied: scaling, rotation, translation, cropping, horizontal and vertical flipping, noise addition, cutout, and padding. The dataset was iteratively balanced and divided into 70% training, 20% validation, and 10% testing subsets. When used to train the YOLOv9c model, the dataset achieved a mean Average Precision mAP@0.5 of 89.6% and an inference speed of 12.7 ms per image. This dataset addresses the scarcity of publicly available multi-class military imagery and provides a useful benchmark for future research in UAV-based object detection, military surveillance, and reconnaissance systems.
Files
Steps to reproduce
1. Collect candidate tank images from two publicly accessible dataset collections on Roboflow Universe (Roboflow, “Roboflow Universe,” [Online]. Available: [https://universe.roboflow.com/](https://universe.roboflow.com/). Accessed: 17 July 2024). 2. Manually screen the collected tank images and retain only images showing tanks from aerial or near-aerial perspectives. During screening, ensure variation in tank type, terrain, weather conditions, occlusion level, and image quality. 3. Keep the final screened tank subset containing 3,000 images. 4. Collect candidate drone images from Roboflow Universe (Roboflow, “Roboflow Universe,” [Online]. Available: [https://universe.roboflow.com/](https://universe.roboflow.com/). Accessed: 17 July 2024) and Kaggle (Kaggle, “Kaggle: Your Machine Learning and Data Science Community,” [Online]. Available: [https://www.kaggle.com/](https://www.kaggle.com/). Accessed: 17 July 2024). 5. Screen the drone images to emphasize multi-rotor UAV diversity, including rotor count, frame size, color, altitude variation, top-view or above-drone viewpoints, and background diversity such as open sky, urban areas, and vegetation. 6. Retain the final drone subset containing 1,359 images. 7. Collect civilian pedestrian images from Roboflow Universe (Roboflow, “Roboflow Universe,” [Online]. Available: [https://universe.roboflow.com/](https://universe.roboflow.com/). Accessed: 17 July 2024). 8. Screen the people images to prioritise bird’s-eye-view perspectives, diverse clothing, and varied population densities, ensuring that the civilian class remains visually distinct from the soldier class. 9. Retain the final people subset containing 2,644 images. 10. Generate the soldier image subset synthetically using Grand Theft Auto V (GTA5) (Rockstar Games, “Grand Theft Auto V (GTA5),” [Video game], Rockstar Games, 2013), as described in Section 5.3. 11. Use the generated GTA5 soldier images as the base soldier subset before augmentation, resulting in 982 base images.
Institutions
- Jordan University of Science and TechnologyIrbid, Irbid