A Multi-Class Medicinal Plant Leaf Dataset with Multiple Leaf Conditions for Plant Health Detection and Classification
Description
This dataset presents a comprehensive collection of medicinal plant leaf images developed for research in plant disease detection, computer vision, and machine learning applications. The data were collected between 11 September 2025 and 27 February 2026 from multiple locations in Bangladesh, including Rajbari (Dhaka), Ashulia (Dhaka), Mirpur (Dhaka), and Hajigonj (Chandpur), using three smartphone devices: iPhone 16 Pro Max, OnePlus Nord CE 4 Lite, and OnePlus 7T. During collection, detached leaves were placed on relatively uniform backgrounds to enhance the visibility of leaf morphology and disease symptoms. Importantly, the original dataset consists entirely of unique leaf samples, where each image corresponds to a different leaf, and no multiple images of the same leaf were captured; additionally, the leaves were collected from different individual plants/trees to ensure high diversity and minimize redundancy. The dataset includes images from seven medicinal plant species—Aloe Vera, Azadirachta Indica (Neem), Centella Asiatica, Hibiscus Rosa Sinensis, Kalanchoe Pinnata, Mikania Micrantha, and Piper Betle—covering multiple leaf condition classes such as healthy, diseased, chlorotic, dried, distorted, insect-affected, mild disease, and different growth stages (young and mature). In total, the dataset contains 1,981 original images and 20,019 augmented images, resulting in 22,000 images. The original images were captured in high resolution (including 3072×4096, 4096×3072, and 3024×4032 pixels, along with other variations recorded in metadata). During preprocessing, background removal techniques were applied to isolate the leaf region and reduce irrelevant visual noise, and all background-removed images were resized to a standardized resolution of 1440 × 1080 pixels, converted to RGB format, normalized, and stored in JPG format. For the augmentation pipeline, images were further resized to 512 × 512 pixels to make them more suitable for deep learning model training. Data augmentation techniques—including rotation, horizontal and vertical flipping, brightness and contrast adjustment, Gaussian noise addition, and image sharpening—were applied to increase dataset diversity and improve class balance, with all augmented images maintained at 512 × 512 resolution. The dataset is organized into three main directories: Original Dataset, Background Remove Dataset (1440 × 1080), and Augmented Dataset (512 × 512), and is accompanied by a CSV metadata file containing structured information such as plant species names, leaf condition labels, image counts, and collection locations, facilitating efficient dataset management and reproducible research.
Files
Steps to reproduce
1. Data Collection: Collect fresh medicinal plant leaves from different locations. Ensure each sample is unique (no repeated leaves) and place them on a uniform background before capturing images using smartphone cameras. 2. Image Acquisition: Capture high-resolution images under natural lighting conditions using multiple devices to ensure variability. 3. Data Organization: Organize the collected images into folders based on plant species and corresponding leaf condition classes. 4. Preprocessing: Apply background removal techniques to isolate the leaf region. Resize all processed images to 1440 × 1080 pixels, convert to RGB format, and normalize pixel values. 5. Data Augmentation: Resize images to 512 × 512 pixels and apply augmentation techniques such as rotation, flipping, brightness and contrast adjustment, Gaussian noise addition, and sharpening to increase dataset diversity. 6. Dataset Structuring: Arrange the dataset into three directories: Original Dataset, Background Removed Dataset, and Augmented Dataset. 7.Metadata Preparation: Create a CSV file containing plant species, class labels, image counts, and collection locations.
Institutions
- Daffodil International UniversityDhaka Division, Dhaka