Multi-Crop Disease Dataset

Published: 26 June 2025| Version 1 | DOI: 10.17632/6243z8r6t6.1
Contributor:

Description

This dataset presents a comprehensive collection of annotated images of diseased and healthy leaves across five important agricultural crops: Banana, Chilli, Radish, Groundnut, and Cauliflower. The dataset was created to support research in plant disease detection, precision agriculture, and deep learning-based crop monitoring systems. Research Hypothesis Early detection and classification of crop diseases using image-based AI models can significantly reduce yield loss and improve sustainable farming practices. This dataset enables training and evaluation of such AI models across multiple crops and diverse disease types. What the Data Shows The dataset contains over 23,000 images captured in real agricultural settings, labeled using bounding box annotations. Each crop includes both healthy and multiple disease-specific categories, with more than 30 total classes (e.g., Sigatoka, Leaf Curl, Anthracnose, Rust, Downy Mildew, Black Rot, etc.). Notable Features High-quality images (640×640 resolution), collected using digital cameras and 200MP mobile phone cameras Annotated with bounding boxes for object detection tasks Data collected from Chengalpattu, Kanchipuram, and Krishnagiri districts, Tamil Nadu, India Covers real-world variations in lighting, leaf orientation, and disease stages How to Interpret and Use the Data Images are organized by crop name and disease class Annotations are provided in YOLO format (can be converted to COCO/VOC) Suitable for training CNN, YOLO, Faster R-CNN, or ViT models for plant disease classification and localization Ideal for researchers working on edge AI, TinyML, and mobile agriculture apps Potential Applications Real-time disease diagnosis in smart farming systems Academic research in plant pathology and computer vision Benchmarking object detection models in agricultural settings

Files

Steps to reproduce

Data Collection Methodology Field Selection: Leaf samples were collected from real farming fields across Chengalpattu, Kanchipuram, and Krishnagiri districts in Tamil Nadu, India, between November and January 2024. Crops Covered: Data includes Banana, Chilli, Radish, Groundnut, and Cauliflower. Healthy and diseased samples were captured across multiple disease types (e.g., Sigatoka, Anthracnose, Rust, Downy Mildew, etc.). Image Acquisition: Captured using high-resolution digital cameras and 200 MP mobile phone cameras Natural lighting conditions and varied angles were maintained for realism All images were resized to 640×640 pixels for consistency Annotation Workflow: Images were uploaded and annotated using the Roboflow platform Bounding box annotations were applied to mark disease-affected regions Exported in YOLO format, suitable for object detection models Data Organization: Structured into folders by crop and disease class Dataset includes healthy vs. multiple disease conditions per crop Tools & Platforms Used: Roboflow for annotation and export Python/OpenCV (optional, for preprocessing if any) File compression tools for packaging (e.g., ZIP) How to Reproduce Researchers can replicate the dataset by: Visiting similar crop fields during active disease periods Using high-resolution mobile cameras for image capture Annotating images using tools like Roboflow, LabelImg, or CVAT Maintaining consistent image resolution and disease class labeling Following the YOLO annotation format for object detection

Institutions

VIT University - Chennai Campus

Categories

Computer Vision, Image Processing, Agricultural Engineering, Agricultural Health, Agricultural Management, Deep Learning

Licence