TerrainSense: Dataset of Off-Road Terrain Obstacles and Traversability Hazards

Published: 11 June 2026| Version 2 | DOI: 10.17632/r6cmjrr6kv.2
Contributors:
, Varad Rane, Arshia Joshi

Description

TerrainSense is a dataset of off-road object detections to help guide vehicles across terrains. There are 2,882 total labeled images, consisting of 11,841 bounding boxes, broken into train, validation and test sets. There are no images with missing labels. Data was collected in actual landscapes using standard consumer devices, then all images were labeled using a mix of human intervention and an iterative approach utilizing a YOLO model. After generating pseudo-labels, low-confidence bounding box imageswere flagged for additional review by humans and then reviewed in labelImg by people to validate the bounding boxes. Please refer to README.md for additional details about the overall dataset and for the corresponding paper draft at docs/terrainSense_research_paper.md. The TerrainSense dataset is organized into 4 classes: obstacle, person, pothole, vehicle. All of the classes are imbalanced (meaning class sizes do not match). The majority of the annotations are labeled as obstacles. The current statistics summary (dataset_stats.json) states that there are 2,303 images in the train set, 288 in the validation set, and 291 in the test set. The images vary in size from approximately 640×640 pixels to 1080×1920 pixels, making the image set contain images at multiple aspect ratios instead of a common fixed size.

Files

Steps to reproduce

Data Collection: We collected outdoor video footage using both a smartphone and a Raspberry Pi camera in many different outdoor environments. Frame Extraction: We divided the video recordings into smaller pieces of video footage based on regular time intervals. Deduplication: We performed a perceptual hash to remove duplicate images within the videos we recorded. Pseudo-Labeling: We used an object detection model to generate proposals for object bounding boxes with confidence scores associated with each proposal. Flagging: We identified photographic frames with low confidence (less than 25% confidence) for human review. Human Review: We manually reviewed the flagged photographic frames by editing, deleting, or adding object bounding boxes. Validation: We checked and corrected the label format, class IDs, and normalized coordinates for the bounding boxes. Merging & Splitting: We consolidated and merged all of the reviewed objects and maintained our original training/validation/testing split of data. Retraining: We performed multiple iterations to re-train our object detection model using the reviewed data and augmented with additional data. Evaluation and Statistics: We counted the number of bounding boxes in each photo, calculated the average precision of each bounding box's confidence score (mAP), and recorded the statistics of our reviewed dataset.

Categories

Computer Science, Computer Vision, Image Processing, Data Science, Machine Learning, Data Acquisition

Licence