Multi-Class Soybean Leaf Disease Dataset: Healthy and Diseased Leaf Images for Machine Learning

Published: 23 April 2026| Version 1 | DOI: 10.17632/6fhphxg297.1
Contributors:
Pranali Magdum, Shweta Jadhav, Anushka Sutar, Riya Oswal

Description

This dataset comprises high-resolution real-time field images of soybean (Glycine max ) leaves, systematically collected and annotated across five classes: Healthy, Bacterial Blight, Cercospora Leaf Blight, Sudden Death Syndrome (SDS), and Soybean Rust. The images were captured under natural field conditions at multiple growth stages to ensure ecological validity and intra-class variability. This resource is intended to support machine learning, deep learning, and computer vision research in automated crop disease detection, precision agriculture, and agricultural AI systems. All images are organised into class-wise folders and are provided in JPEG format with consistent resolution. Disease Classes and Descriptions: The dataset encompasses the following five classes. Each class folder is self-contained and independently labeled. 1. Healthy Images in this class represent soybean leaves with no visible symptoms of disease or pest damage. The leaves display uniform dark green coloration, intact surface texture, and normal morphology. This class serves as the negative/control class for binary andmulti-class classification tasks. 2. Bacterial Blight (Pseudomonas savastanoi pv. glycinea) Bacterial Blight is caused by the gram-negative bacterium Pseudomonas savastanoi pv. glycinea. It is spread by rain splash and wind-driven rain, typically manifesting under cool, humid conditions. Images capture characteristic angular water-soaked lesions that turn brown and are often surrounded by yellow halos. 3. Cercospora Leaf Blight (Cercospora kikuchii ) Cercospora Leaf Blight (CLB) is a fungal disease caused by Cercospora kikuchii, favoured by warm and humid conditions during late reproductive stages. It produces a characteristic purplish-red to bronze discoloration on the upper leaf surface. 4. Sudden Death Syndrome (Fusarium virguliforme) Sudden Death Syndrome (SDS) is caused by the soilborne fungus Fusarium virguliforme. While the pathogen infects roots early in the season, foliar symptoms appear mid-to-late season as interveinal chlorosis and necrosis. 5. Soybean Rust (Phakopsora pachyrhizi ) Soybean Rust is caused by the obligate biotrophic fungus Phakopsora pachyrhizi and is one of the most destructive foliar diseases worldwide. It spreads rapidly via airborne urediniospores and can cause total crop loss in epidemic conditions. Dataset Folder Structure: Soyabean leaf desease dataset/ |-- Bacterial Blight/ | |-- BB(1).jpg | ‘-- ... |-- Cercospora Leaf Blight/ | |-- CLB(1).jpg | ‘-- ... |-- Healthy/ | |-- HEALTHY(1).jpg | ‘-- ... |-- Rust/ | |-- RUST(1).jpg | ‘-- ... |-- Sudden Death Syndrome/ | |-- SDS(1).jpg | ‘-- ...

Files

Steps to reproduce

Field Collection Images were collected directly from soybean farms under real agricultural conditions. Data collection spanned multiple growth stages (V-stage and R-stage) of the soybean crop to capture early, mid, and late disease progression. Care was taken to include samples at different times of the day and in varying weather conditions to maximize photometric diversity. Equipments: • Camera: DSLR / Smartphone camera (minimum 12 MP resolution) • Distance: 5–30 cm from leaf surface for macro detail • Orientation: Adaxial (upper) leaf surfaces captured. • Background: Natural field background (no artificial isolation) Annotation and Labeling Disease identification was performed by trained agronomists and plant pathologists with field experience. Each image was reviewed and labeled at the class level. Labels are encoded via folder structure — each class resides in a separate folder. No bounding-box or pixel-level (segmentation) annotations are included in this version. Quality Control • Blurred or overexposed images were excluded • Duplicate and near-duplicate images removed using perceptual hashing • Minimum resolution enforced: 224 × 224 pixels (most images are 1080p) • Multi-disease co-infected leaves were excluded to maintain class purity

Categories

Machine Learning, Deep Learning, Meta Dataset

Licence