HeteroTraffic: Annotated Dataset for Multi-Class Vehicle Detection in Varied Illumination and Road Conditions

Published: 7 November 2025| Version 2 | DOI: 10.17632/tvkfk56s2k.2
Contributors:
Md Jahidul Alam Sagar,

Description

The HeteroTraffic dataset is a large-scale, multi-class dataset developed for heterogeneous vehicle detection and intelligent transportation research. It contains 17,310 high-resolution images, each annotated in YOLO format to facilitate training and evaluation of object detection models such as YOLOv8, YOLOv11, and EfficientDet. The images were collected from real-world highway and roadside environments under diverse traffic densities, weather conditions, and illumination variations. Data were acquired using DSLR cameras and smartphones, ensuring a mix of perspectives and resolutions that reflect realistic driving and surveillance scenarios.An additional version of the dataset has been uploaded where all faces and vehicle license plates are blurred to ensure privacy compliance and ethical data sharing. Each image in the dataset is paired with an annotation file containing bounding boxes and class identifiers generated through LabelMe, later converted to YOLO format. The dataset includes 17 heterogeneous classes that represent a wide variety of vehicles and road users commonly found in mixed-traffic environments. Vehicle Classes: Motorbike, MPV, Pedestrian, Pickup, PowerTiller, Rickshaw, Bicycle, Bus, Bhotbhoti, Car, CNG, Easybike, Leguna, ShoppingVan, Truck, Van, and Wheelbarrow. This diversity makes HeteroTraffic particularly suitable for developing models capable of distinguishing between both conventional and region-specific vehicle types, enhancing real-world generalization in computer vision applications. Dataset Structure: HeteroTraffic/ │ ├── images/ │ ├── *.jpg │ ├── labels/ │ ├── *.txt │ └── data.yaml Each .txt file in the labels directory contains YOLO-formatted annotations: <class_id> <x_center> <y_center> <width> <height> Key Features: Total Images: 16,289 Annotation Format: YOLO (converted from LabelMe JSON) Number of Classes: 17 Data Type: RGB road and highway scenes Image Sources: DSLR and smartphone cameras Annotation Verification: Manually reviewed for accuracy Use Cases: Vehicle detection, traffic monitoring, intelligent transportation, and autonomous driving Highlights Provides diverse, heterogeneous vehicle categories including low-frequency and region-specific types. Enables benchmarking of deep learning models under real-world highway conditions. Supports transfer learning, domain adaptation, and object detection studies. Serves as a high-quality open resource for both academic and industrial research.

Files

Steps to reproduce

The following steps describe the complete process used to create the HeteroTraffic dataset, from data collection to annotation and formatting: 1. Image Acquisition Images were captured using DSLR cameras and smartphone devices from various highway and roadside environments. Data collection was conducted under different lighting, weather, and traffic conditions to ensure diversity and represent real-world complexity. 2. Annotation Process Each image was manually annotated using the LabelMe tool. Bounding boxes were drawn tightly around visible objects corresponding to 17 target classes: Motorbike, MPV, Pedestrian, Pickup, PowerTiller, Rickshaw, Bicycle, Bus, Bhotbhoti, Car, CNG, Easybike, Leguna, ShoppingVan, Truck, Van, and Wheelbarrow. The annotations were exported in JSON format from LabelMe. 3. Conversion to YOLO Format The LabelMe JSON annotations were converted to YOLO text format using a Python conversion script. Each image has a corresponding .txt file containing normalized bounding box coordinates in the format: <class_id> <x_center> <y_center> <width> <height> 4. Data Organization All image files were stored in the /images directory, and the corresponding label files were placed in the /labels directory. A configuration file named data.yaml was created, listing all class names and directory paths to ensure direct compatibility with YOLO-based training frameworks. 5. Quality Verification A random subset of annotations was visually verified to confirm the precision of bounding boxes and class labels. Misaligned or ambiguous bounding boxes were corrected before finalizing the dataset. 6. Data Preparation for Model Training The dataset can be directly used for YOLO-based object detection tasks by updating the dataset path in the training configuration file. Example command for YOLOv8 or YOLOv11 training: yolo detect train data=data.yaml model=yolov8n.pt epochs=100 imgsz=640

Institutions

  • Daffodil International University

Categories

Computer Vision, Object Detection, Autonomous Driving, Autonomous Vehicle, YOLOv7

Licence