AutoNaVIT : Vision-Based Path and Obstacle Segmentation Dataset for Autonomous Driving - CSV Compatible

Published: 14 April 2025| Version 1 | DOI: 10.17632/kb9sgg7x2p.1
Contributors:
,
,

Description

AutoNaVIT is a carefully designed dataset intended to advance research in autonomous navigation, semantic scene understanding, and deep learning-based object segmentation. This release includes only the annotation labels in CSV format, corresponding to high-resolution frames extracted from a driving sequence recorded at Vellore Institute of Technology – Chennai Campus (VIT-C). The corresponding images will be provided in Version 2 of the dataset. The dataset comprises manually annotated bounding boxes for three key classes that are critical for path planning and perception in autonomous vehicle systems: Kerb – 1,377 instances Obstacle – 258 instances Path – 532 instances All annotations were generated using Roboflow, with precise, human-verified labeling for consistent, high-quality data—essential for training robust models that generalize well to real-world urban and semi-urban driving scenarios. Data Capture Specifications The video footage used for annotation was recorded using a Sony IMX890 camera sensor under stable daylight conditions, with the following details: Sensor Size: 1/1.56", 50 MP Lens: 6P optical configuration Aperture: ƒ/1.8 Focal Length: 24mm equivalent Pixel Size: 1.0 µm Features: Optical Image Stabilization (OIS), PDAF autofocus Video Duration: 4 minutes 11 seconds Frame Rate: 2 FPS Total Annotated Frames: 504 Format Compatibility and Model Support AutoNaVIT’s annotations are made available in standard CSV format, enabling direct compatibility with the following three models: Multiclass TensorFlow CSV RetinaNet Since CSV is a highly adaptable format, the annotations can be easily modified or reformatted to suit other deep learning models or pipelines that support CSV-based label structures. Benchmark Results To validate the dataset's effectiveness, a segmentation model using YOLOv8 was trained with the full dataset (images + annotations). The resulting performance metrics were: Mean Average Precision (mAP): 96.5% Precision: 92.2% Recall: 94.4% These metrics confirm the dataset’s value in developing perception systems for autonomous vehicles, particularly for object detection and path segmentation tasks. Disclaimer and Attribution Requirement By accessing or using this dataset, users agree to the following terms under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0): The dataset is available for non-commercial academic and research purposes only. Proper attribution must be included as: “Dataset courtesy of Vellore Institute of Technology – Chennai Campus.” This citation must appear in all forms of publication, presentation, or dissemination using this dataset. Redistribution, commercial usage, public hosting, or modification of the dataset is not permitted without explicit written consent from VIT-C. Use of the dataset indicates acceptance of these conditions. All rights not explicitly granted are reserved by VIT-C.

Files

Steps to reproduce

1. Data Collection: To replicate the dataset, begin by recording a video along a controlled or semi-urban driving path using a high-resolution camera. It is recommended to use a sensor similar to the Sony IMX890, which features: 50MP resolution 24mm focal length ƒ/1.8 aperture Optical Image Stabilization (OIS) Capture footage under daylight conditions at a standard frame rate (ideally 30 FPS) to ensure consistent and high-quality imagery suitable for annotation. 2. Frame Extraction: Extract video frames at a rate of 2 frames per second (FPS). This frequency offers a good balance between variety and frame uniqueness, helping to prevent data redundancy. Make sure the extracted frames maintain their original resolution for precise bounding box annotation. 3. Annotation Using Roboflow: Upload the extracted frames to Roboflow or a similar annotation tool that supports bounding box annotations. Manually annotate each object using rectangular bounding boxes for the following three classes: Kerb Obstacle Path Ensure accurate box placement around objects of interest. After completing annotations, export the dataset in CSV format, which aligns with the CSV-compatible version of AutoNaVIT. 4. Model Training and Performance Evaluation: To validate the dataset’s utility, train a compatible object detection model (e.g., RetinaNet or TensorFlow OD API) using the labeled data. In the official benchmark conducted using YOLOv8 segmentation (with paired images), the following metrics were observed: Mean Average Precision (mAP): 96.5% Precision: 92.2% Recall: 94.4% These results demonstrate the dataset’s effectiveness in object detection tasks for autonomous vehicle navigation. 5. CSV Format Compatibility: The dataset’s CSV structure is optimized for direct integration with three model families that support standard CSV annotations: Multiclass TensorFlow CSV RetinaNet Since the annotations follow the conventional CSV schema, they can be easily converted or adjusted for other CSV-compatible frameworks. This makes the dataset flexible for various object detection pipelines and use cases. By following these steps, researchers and developers can replicate and build upon the AutoNaVIT dataset in CSV format, enabling precise benchmarking and experimentation in the field of autonomous vehicle perception and scene understanding.

Institutions

VIT University - Chennai Campus

Categories

Computer Vision, Autonomous Driving, Smart Transportation, Machine Vision, Deep Learning, Autonomous Navigation

Licence