Horseradish and weed dataset from commercial fields in Southern Illinois

Published: 19 February 2026| Version 1 | DOI: 10.17632/fcf7brsfm6.1
Contributors:
, Sunoj Shajahan

Description

This dataset contains annotated RGB images collected to develop and benchmark YOLO models for weed detection in commercial horseradish (Armoracia rusticana) production systems in Southern Illinois. The data support research on vision-based, real-time robotic weeding in a high-value specialty crop with limited herbicide options. This dataset is provided to accompany the manuscript submitted to Frontiers in Agronomy, titled ‘Evaluation of YOLO-based Weed Detection Models on Commercial Horseradish Fields in Southern Illinois'. Images were captured during the 2024 growing season from two commercial horseradish fields in Collinsville, Illinois, and one research plot (Illinois Autonomous Farm, UIUC). Two acquisition platforms were used: * A handheld Apple iPhone 13 Mini mounted on a monopod for proximal imaging between crop rows. * A Farm-ng Amiga mobile robot equipped with Luxonis OAK-D cameras for robotic imaging. Videos were recorded at 30 and 60 fps and further processed to convert to image frames. After curation and augmentation (e.g., flipping, saturation adjustments), the dataset contains approximately 2,696 images. Each image is annotated for object detection with two classes: i. Horseradish (crop); ii. Weed (composite non-crop class, including key species such as waterhemp, Amaranthus tuberculatus, and Palmer amaranth, Amaranthus palmeri, along with other broadleaf and grass weeds). Annotations are provided in a YOLO-compatible format to facilitate ease in training. The dataset was originally used to compare multiple YOLO variants (e.g., nano/small/medium models from different YOLO v8, v11, and v12). The performance metrics were based on accuracy (precision, recall, F1, mAP@50) and computational criteria (inference time, model size, GFLOPs). We also tested various compute platforms for potential deployment on embedded and edge-computing platforms. Intended uses include: * Training and evaluating crop-weed detection models in specialty crops. * Benchmarking lightweight YOLO object detectors for real-time inference. * Best tuned and trained model for testing inference. Along with the images and labels, we provided an example script (training_script.ipynb) showing the training and inference scripts and brief documentation describing the dataset structure, class definitions, and recommended preprocessing steps. The uploaded folder is arranged in the following file structure: Horseradish-weed-dataset ┣ data ┃ ┣ test ┃ ┃ ┣ images ┃ ┃ ┣ labels ┃ ┣ train ┃ ┃ ┣ images ┃ ┃ ┣ labels ┃ ┣ valid ┃ ┃ ┣ images ┃ ┃ ┣ labels ┃ ┗ data.yaml ┣ Horseradish_8n_200epochs_best.pt ┣ requirements.txt ┗ training_script.ipynb

Files

Steps to reproduce

Instructions are provided in the Python notebook file (training_script.ipynb). If you encounter any issues, please contact Abhinav Pagadala (asp14@illinois.edu) or Sunoj Shajahan (sunoj@illinois.edu)

Institutions

Categories

Artificial Intelligence, Computer Vision, Robotics, Machine Learning, Weed Management

Funders

Licence