AutoNaVIT-C : Vision-Based Path and Obstacle Segmentation Dataset for Autonomous Driving - XML Compatible
Description
AutoNaVIT is a meticulously developed dataset designed to accelerate research in autonomous navigation, semantic scene understanding, and object segmentation through deep learning. This release includes only the annotation labels in XML format, aligned with high-resolution frames extracted from a controlled driving sequence at Vellore Institute of Technology – Chennai Campus (VIT-C). The corresponding images will be included in Version 2 of the dataset. Class Annotations The dataset features carefully annotated bounding boxes for the following three essential classes relevant to real-time navigation and path planning in autonomous vehicles: Kerb – 1,377 instances Obstacle – 258 instances Path – 532 instances All annotations were produced using Roboflow with human-verified precision, ensuring consistent, high-quality data that supports robust model development for urban and semi-urban scenarios. Data Capture Specifications The source video was captured using a Sony IMX890 sensor, under stable daylight lighting. Below are the capture parameters: Sensor Size: 1/1.56", 50 MP Lens: 6P optical configuration Aperture: ƒ/1.8 Focal Length: 24mm equivalent Pixel Size: 1.0 µm Features: Optical Image Stabilization (OIS), PDAF autofocus Video Duration: 4 minutes 11 seconds Frame Rate: 2 FPS Total Annotated Frames: 504 Format Compatibility and Model Support AutoNaVIT annotations are provided in Pascal VOC-compatible XML format, making them directly usable with models that support the Pascal VOC standard. The dataset is immediately compatible with: Pascal VOC As XML is a structured, extensible format, these annotations can be easily adapted for use with additional object detection frameworks that support XML-based label schemas. Benchmark Results To assess dataset utility, a YOLOv8 segmentation model was trained on the full dataset (including images). The model achieved the following results: Mean Average Precision (mAP): 96.5% Precision: 92.2% Recall: 94.4% These metrics demonstrate the dataset’s effectiveness in training models for autonomous vehicle perception and obstacle detection. Disclaimer and Attribution Requirement By downloading or using this dataset, users agree to the terms outlined in the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0): This dataset is available solely for academic and non-commercial research purposes. Proper attribution must be provided as follows: “Dataset courtesy of Vellore Institute of Technology – Chennai Campus.” This citation must appear in all research papers, presentations, or any work derived from this dataset. Redistribution, public hosting, commercial use, or modification is prohibited without prior written permission from VIT-C. Use of this dataset implies acceptance of these terms. All rights not explicitly granted are retained by VIT-C.
Files
Steps to reproduce
1. Data Collection: To replicate the dataset, begin by capturing a video along a controlled or semi-urban driving path using a high-resolution camera. For best results, use a sensor similar to the Sony IMX890, which provides: 50MP resolution 24mm focal length ƒ/1.8 aperture Optical Image Stabilization (OIS) Record the footage under consistent daylight conditions at a standard frame rate (ideally 30 FPS) to ensure uniform exposure and scene clarity suitable for object detection tasks. 2. Frame Extraction: Extract frames from the recorded video at 2 frames per second (FPS). This frame rate ensures scene variability without introducing excessive redundancy. Maintain the original resolution of each frame to support precise bounding box annotation. 3. Annotation Using Roboflow: Upload the extracted frames to Roboflow or any annotation tool that supports Pascal VOC (XML) format. Manually annotate the dataset using bounding boxes for the following classes: Kerb Obstacle Path Ensure that the bounding boxes are tightly fitted around the objects and that each annotation is correctly labeled. Once the annotation process is complete, export the dataset in XML format (Pascal VOC compatible). 4. Model Training and Performance Evaluation: To evaluate the dataset’s performance, train an object detection model that supports Pascal VOC-style XML annotations. For reference, a benchmark was conducted using a YOLOv8-based segmentation model (images + labels), which achieved the following: Mean Average Precision (mAP): 96.5% Precision: 92.2% Recall: 94.4% These metrics affirm the dataset’s effectiveness in training models for perception systems in autonomous navigation. 5. XML Format Compatibility: The exported XML annotations are structured in accordance with the Pascal VOC format, offering direct compatibility with: Pascal VOC models Due to its structured nature, the XML format can also be easily adapted for other detection models and frameworks that support or can parse Pascal VOC-style annotations. This flexibility makes it suitable for diverse training pipelines. By following these steps, researchers and practitioners can successfully replicate the AutoNaVIT XML-compatible dataset, facilitating advanced experimentation and benchmarking in autonomous vehicle perception, object detection, and scene understanding.