HuBot Dataset: Annotated Data for Non-Disruptive Bird Behavior Study
Description
This dataset was created as part of the HuBot project, which focuses on developing a biomimetic mobile robot for non-invasive ecological research and wildlife monitoring. It contains a curated collection of 5,000 high-quality images of Houbara bustards, annotated for object detection tasks. The dataset aims to support research in ecological conservation and robotic applications, enabling accurate and reliable detection of Houbara bustards in diverse environments. Dataset Details: Size: 5,000 images, divided into: Training set: 4,000 images. Validation set: 500 images. Testing set: 500 images. Annotations: Each image includes bounding box annotations for detecting and localizing Houbara bustards. Sources: Images were collected from in-house footage, online repositories, and globally deployed camera traps. Features: Bias Mitigation: Includes diverse lighting conditions, backgrounds, and challenging scenarios to avoid overfitting. Balanced representation of Houbara behaviors and habitats to ensure unbiased performance. Data Augmentation: Applied techniques such as horizontal flips, cropping, blurring, and noise addition to enhance model generalization. Diversity and Representation: The dataset captures a range of environmental conditions to support real-world applications and improve model robustness. Future Scaling: Plans are in place to expand the dataset to 50,000 images, covering rare Houbara behaviors, multiple subspecies, and more geographical locations. Applications: Object detection in biomimetic robotics. Ecological research on endangered species. Non-invasive data collection for long-term wildlife monitoring. This dataset has been instrumental in optimizing HuBot’s object detection algorithms, allowing reliable performance in diverse conditions while minimizing disturbance to wildlife. It serves as a valuable resource for researchers and developers working on ecological robotics, conservation, and artificial intelligence. License: This dataset is shared under the CC BY 4.0 license, allowing use for non-commercial purposes with appropriate attribution.
Files
Steps to reproduce
1. Download the Dataset Download the complete dataset, which includes: Images folder: Annotated images in .jpg format. Labels folder: Bounding box annotations in YOLO format. Metadata: Information on environmental conditions and data sources. README file: Contains instructions and detailed explanations of the dataset structure. 2. Set Up the Environment Hardware Requirements: GPU-enabled machine with at least 8GB of VRAM (e.g., NVIDIA RTX 3070 or higher). Software Requirements: Python 3.8 or higher. PyTorch or TensorFlow (depending on the deep learning framework used). YOLOv9 or compatible object detection implementation. 3. Prepare the Dataset Unzip the downloaded dataset and ensure the structure follows the YOLO format: Dataset/ ├── train/ │ ├── images/ │ ├── labels/ ├── valid/ │ ├── images/ │ ├── labels/ └── test/ ├── images/ ├── labels/ Verify the labels folder contains .txt files corresponding to each image in the images folder. 4. Set Up Training Configuration Create or modify the data.yaml file for YOLOv9, specifying: train: path/to/Dataset/train/images val: path/to/Dataset/valid/images nc: 1 # Number of classes names: ['Houbara'] # Class name Resize images to 640x640 pixels if needed. 5. Train the Model Use a YOLOv9 implementation to train on the dataset: python train.py --img 640 --batch 16 --epochs 50 --data path/to/data.yaml --weights yolov9.pt 6. Evaluate the Model Evaluate model performance on the test set: python val.py --data path/to/data.yaml --weights path/to/best.pt --img 640 Expected Outputs: Mean Average Precision (mAP) scores. Detection accuracy metrics. 7. Use the Dataset for Research For custom tasks or further analysis: Load the images and annotations programmatically using libraries like OpenCV or PyTorch Datasets. Modify or expand the dataset by applying custom augmentations. 8. Expand or Modify the Dataset To expand the dataset: Follow the same annotation format for additional images. Use GroundingDino or other automated tools for initial labeling, then refine manually. Update the dataset structure and re-train models as needed. 9. Cite the Dataset When using this dataset, cite the original publication and dataset link: Saad Saoud, L. , et al. (2024). “HuBot Dataset: Annotated Data for Non-Disruptive Bird Behavior Study”, Mendeley Data, V1, doi: 10.17632/tx3vrvsrgv.1