Mango and Groundnut Leaf Disease Dataset for Multi-Class Classification (91,438 Images)
Description
This dataset contains 91,438 annotated images of mango and groundnut crops, developed for multi-class plant disease classification tasks. The dataset comprises 9 classes representing major diseases and healthy conditions: Mango: Anthracnose, Powdery Mildew, Dieback, Healthy Mango Flower, Healthy Mango Leaf Groundnut: Early Leaf Spot, Late Leaf Spot, Rust, Healthy Groundnut Each class contains approximately 9,500 to 11,500 images, resulting in a well-balanced dataset suitable for training deep learning models. Images were collected under diverse real-world conditions, including variations in lighting, background, and capture angles, to improve robustness and generalization performance. All images are resized to a uniform resolution of 256 × 256 pixels, making the dataset directly suitable for deep learning models without additional preprocessing. The dataset is pre-split into training, validation, and test sets (70/15/15) to ensure reproducibility and fair benchmarking across machine learning experiments. This dataset can be used for: Image classification Plant disease detection Transfer learning Benchmarking deep learning models Potential applications include precision agriculture, automated crop monitoring, and early disease detection systems.
Files
Steps to reproduce
Collect raw images of mango and groundnut crops under diverse environmental conditions, ensuring coverage of all disease classes and healthy samples. Organize images into class-wise directories corresponding to each disease and healthy category. Preprocess the dataset by removing corrupted or low-quality images and standardizing file formats (JPG/PNG). Normalize class names (lowercase with underscores) for consistency and machine learning compatibility. Shuffle images within each class using a fixed random seed (e.g., 42) to ensure reproducibility. Split the dataset into training (70%), validation (15%), and test (15%) sets on a per-class basis. Copy images into structured directories: train/class_name/ validation/class_name/ test/class_name/ Generate a CSV file (labels.csv) containing: image_path, class, split Verify dataset integrity by checking class distribution and split ratios. Package the dataset with folders, CSV file, and README documentation for distribution.