AutoNaVIT : Vision-Based Path and Obstacle Segmentation Dataset for Autonomous Driving - JSON Compatible
Description
AutoNaVIT is a curated segmentation label dataset developed to support research in autonomous navigation, scene understanding, and deep learning-based object segmentation. This release contains only the annotation labels corresponding to high-resolution frames extracted from a recorded driving sequence at Vellore Institute of Technology – Chennai Campus (VIT-C). The corresponding images will be released in Version 2 of the dataset shortly. The dataset offers manually annotated, pixel-accurate segmentation masks for three key classes relevant to autonomous vehicle navigation: Kerb – 1,377 instances Obstacle – 258 instances Path – 532 instances All annotations were generated using Roboflow, ensuring high fidelity and consistency for real-world autonomous driving applications in urban and semi-urban environments. Data Capture Specifications: Source imagery was recorded using a Sony IMX890 sensor with the following specifications: Sensor Size: 1/1.56", 50 MP Lens: 6P, ƒ/1.8, 24mm equivalent, 1.0 µm pixels Features: OIS, PDAF autofocus Duration: 4 min 11 sec video Frame Rate: 2 FPS Total Annotated Frames: 504 Format Compatibility and Model Support: AutoNaVIT annotations are provided in standard JSON format, enabling direct compatibility with the following 7 models: Florence-2 OD Paligemma COCO COCO – Segmentation CreateML COCO – MMDetection SAM2 As the format adheres to common JSON standards, the labels can be easily adapted for use with other models or frameworks that support JSON-based annotations. Benchmark Results: To validate the dataset’s effectiveness, a YOLOv8-based segmentation model was trained using the full dataset (images + annotations). The model achieved: Mean Average Precision (mAP): 96.5% Precision: 92.2% Recall: 94.4% These results demonstrate the dataset’s reliability for training and evaluating segmentation models in autonomous vehicle systems. Disclaimer and Attribution Requirement: By accessing or using this dataset, users agree to the following terms under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0): Usage is permitted only for non-commercial academic and research purposes. Proper attribution must state: “Dataset courtesy of Vellore Institute of Technology – Chennai Campus.” This must be included in all publications, presentations, or other dissemination formats. Redistribution, commercial use, modification, or public hosting of the dataset in any form is prohibited without explicit written permission from VIT-C. Use of this dataset implies acceptance of these terms. All rights not expressly granted are reserved by VIT-C.
Files
Steps to reproduce
1. Data Collection: To replicate the dataset, begin by recording a video along a controlled or semi-urban driving path using a high-resolution camera. It is recommended to use a sensor similar to the Sony IMX890, which features: 50MP resolution 24mm focal length ƒ/1.8 aperture Optical Image Stabilization (OIS) Capture footage under daylight conditions at a standard frame rate (ideally 30 FPS) to ensure consistent and high-quality imagery. 2. Frame Extraction: Extract video frames at a rate of 2 frames per second (FPS). This frequency provides a balanced dataset by offering diverse scene changes without introducing excessive redundancy. Ensure the extracted frames maintain their original high resolution to support accurate segmentation. 3. Annotation Using Roboflow: Upload the extracted frames to Roboflow or a similar annotation tool. Manually annotate each frame using polygon-based segmentation to define the following classes: Kerb Obstacle Path Ensure pixel-level accuracy across all labels. Upon completion, export the annotations in JSON format, which aligns with the current version of the AutoNaVIT dataset. 4. Model Training and Performance Evaluation: To validate the dataset's utility, train a segmentation model such as YOLOv8 using the labeled data (if images are available). In the official benchmark, training on the full dataset yielded: Mean Average Precision (mAP): 96.5% Precision: 92.2% Recall: 94.4% These metrics highlight the dataset’s suitability for real-world autonomous vehicle navigation tasks. 5. JSON Format Compatibility: The dataset’s JSON structure has been designed for direct compatibility with seven deep learning models, including: Florence-2 OD Paligemma COCO COCO – Segmentation CreateML COCO – MMDetection SAM2 Since the labels follow standard JSON schemas, they can be easily modified for compatibility with other JSON-based frameworks or pipelines, making the dataset highly adaptable. By following these steps, researchers and developers can effectively replicate and expand the AutoNaVIT dataset, enabling advanced experimentation and benchmarking for autonomous vehicle perception systems.
Institutions
- VIT University - Chennai Campus