Bean Leaf Disease Dataset for Multi-Class Image Classification

Published: 5 May 2026| Version 1 | DOI: 10.17632/6m8ms6ptd4.1
Contributors:
,
,
,
,
,

Description

This dataset contains 1,200 field-collected bean leaf images captured from agricultural plots in Bangladesh for the purpose of multi-class plant disease classification. The images were collected under real-world environmental conditions, including natural lighting, varying backgrounds, and diverse leaf orientations, ensuring realistic variability for computer vision and deep learning applications. Data collection was conducted in Sholakura village, Doulotpur union, Belkuchi upazila, Sirajganj district, Bangladesh, on 17th and 18th October 2025. The dataset reflects authentic agricultural conditions and was gathered under the supervision of local agricultural authorities. The dataset consists of four balanced classes: - Healthy Leaf (300 images) - Leaf Spot Disease (300 images) - Pest Damage (300 images) - Yellow Mosaic Disease (300 images) All images are organized into class-specific directories and follow a standardized naming convention. A metadata.csv file is provided, containing labels and predefined train, validation, and test splits (70%, 15%, 15%) using stratified sampling with a fixed random seed to ensure reproducibility. This dataset is intended for research and academic use in areas such as deep learning, computer vision, plant disease detection, and explainable AI. The dataset has been validated by agricultural officers and collected under academic supervision, ensuring data authenticity and research reliability.

Files

Steps to reproduce

Steps to Reproduce 1. Download the dataset from Mendeley Data and extract the files. 2. Ensure the folder structure is as follows: bean_leaf_dataset/ ├── healthy/ ├── leaf_spot/ ├── yellow_mosaic/ ├── pest_damage/ ├── metadata.csv 3. Install required Python libraries: pip install pandas scikit-learn pillow 4. Load the metadata file: import pandas as pd df = pd.read_csv("metadata.csv") 5. Access dataset splits: train_df = df[df["split"] == "train"] val_df = df[df["split"] == "val"] test_df = df[df["split"] == "test"] 6. Load images using file paths: import os from PIL import Image root = "bean_leaf_dataset" def load_image(row): path = os.path.join(root, row["label"], row["filename"]) return Image.open(path).convert("RGB") 7. Use the dataset for training or evaluation in any deep learning framework. 8. Reproducibility note: - Data splits are predefined in metadata.csv - Stratified sampling with fixed random seed (42) was used - No additional random splitting is required

Institutions

Categories

Agricultural Science, Computer Vision, Deep Learning

Licence