MoringaLeafNet: A Multi-Class Leaf Disease Dataset for Precision Agriculture and Deep Learning Research

Published: 29 September 2025| Version 5 | DOI: 10.17632/w8sr775pjb.5
Contributors:
,
,
,

Description

MoringaLeafNet (Version 4) is a curated, validated, and metadata-rich dataset of Moringa oleifera leaves created to support machine learning, deep learning, and computer vision research in agricultural disease diagnosis. This version incorporates rigorous annotation validation, transparent preprocessing documentation, structured metadata, and improved repository organization to ensure high reproducibility and usability. The dataset contains 11,268 images across four classes of leaf health and disease, collected from two locations in Bangladesh under natural field conditions: - Sumi Nursery, Madhupur, Tangail (March–April 2025) - Rafin Nursery, Birulia, Savar (August–September 2025) Images were captured using two smartphone devices (iQOO Z9 Turbo and OnePlus 8T), then processed into raw, preprocessed, and augmented versions for flexible research applications. Primary Classes: Collection 1 – Sumi Nursery, Tangail (n = 2,228 original) 1. Healthy Leaf – 464 images 2. Yellow Leaf – 624 images 3. Bacterial Leaf Spot – 727 images 4. Cercospora Leaf Spot – 413 images Collection 2 – Rafin Nursery, Savar (n = 589 original) 1. Healthy Leaf – 133 images 2. Yellow Leaf – 171 images 3. Bacterial Leaf Spot – 130 images 4. Cercospora Leaf Spot – 155 images Augmentation & Dataset Totals Biologically realistic augmentation was applied with controlled parameters to balance classes and enhance variability: - Rotation: –30° to +30° - Flipping: horizontal and vertical (50% probability) - Brightness adjustment: 0.8–1.2 - Contrast adjustment: 0.8–1.2 Dataset counts: - Original images: 2,817 - Preprocessed images: 2,817 - Augmented images: 8,451 - Final Total: 11,268 (Preprocessed + Augmented) Annotation & Validation - Labeled by an expert agronomist (Prof. Dr. M. A. Rahim, DIU). - Cross-checked subset (n ≈ 300) yielded 92% agreement (Cohen’s κ = 0.87), confirming reliability. - Sharpness and visibility criteria were applied to exclude blurred or unclear images. Preprocessing & Duplicate Removal Three dataset tiers: 1. Original Images – raw field images. 2. Preprocessed Images – resized (*3000 × 3000 px*), background removed, replaced with uniform white. 3. Augmented Images – balanced dataset with synthetic variability. Duplicates removed using perceptual hashing (ImageHash v4.3) with manual verification. - Image Name - Class Name - Date - Location - Temperature - Weather - Device Repository contents: - Original Images.zip (raw images from both locations) - Preprocessed Images.zip (background-standardized images, Savar set) - Augmented Images.zip (class-balanced set, 8,451 images) - metadata.csv (structured metadata) NB: This version does not contain any duplicate images.

Files

Institutions

  • Daffodil International University

Categories

Computer Vision, Machine Learning, Image Classification, Agronomy Sustainability, Deep Learning, Agriculture

Licence