A medicinal plant leaf image dataset for plant health condition detection and classification

Published: 9 March 2026| Version 1 | DOI: 10.17632/89rfgtxbdc.1
Contributors:
,
,
,

Description

This dataset contains a curated collection of medicinal plant leaf images developed for research in plant disease detection, computer vision, and machine learning applications. The images were collected from three locations in Dhaka, Bangladesh—Rajbari, Ashulia, and Mirpur—between January 7, 2026 and February 27, 2026 using smartphone cameras (OnePlus Nord CE 4 Lite and iPhone 16 Pro Max). During data collection, detached leaves were placed on a uniform background to improve visibility of leaf morphology and disease symptoms. The dataset includes images from five medicinal plant species: Aloe Vera, Azadirachta Indica (Neem), Hibiscus Rosa Sinensis, Kalanchoe Pinnata, and Piper Betle, covering 16 leaf condition classes such as healthy, chlorotic, diseased, dried, and different growth stages. In total, the dataset contains 1,323 original images captured during field collection and 14,677 augmented images, resulting in 16,000 images. The original images were captured in high resolution (3072 x 4096 pixels, 4096 x 3072 pixels and 3024 x 4032 pixels), and all processed images were standardized to 512 × 512 pixels, converted to RGB color format, and stored in JPG format to ensure compatibility with machine learning and deep learning models. During preprocessing, background removal techniques were applied to isolate the leaf region and reduce irrelevant visual noise, while pixel values were normalized to maintain consistent image quality. Data augmentation techniques, including rotation, horizontal and vertical flipping, brightness and contrast adjustment, Gaussian noise addition, and image sharpening were applied to increase dataset diversity and improve class balance. The dataset is organized into two main directories: Original Images, which contain the raw captured leaf images, and Processed Images, which include resized, normalized, and augmented samples generated from the original dataset. Additionally, a CSV metadata file is included that provides structured information such as plant species names, leaf condition labels, image counts, and data collection locations, enabling easier dataset management and supporting reproducible machine learning experiments.

Files

Steps to reproduce

1. Medicinal plant leaf images were collected from three locations in Dhaka, Bangladesh—Rajbari, Ashulia, and Mirpur—between January 7, 2026 and February 27, 2026. 2. Images were captured using smartphone cameras (OnePlus Nord CE 4 Lite and iPhone 16 Pro Max) under natural lighting conditions. 3. Detached leaves were placed on a uniform background during image capture to clearly observe leaf morphology and disease symptoms. 4. The collected images were manually reviewed and low-quality, blurred, or duplicate images were removed. 5. Images were categorized according to plant species and leaf health conditions such as healthy, chlorotic, diseased, dried, and different growth stages. 6. Background removal techniques were applied to isolate the leaf region and reduce unnecessary visual noise. 7. All images were resized to 512 × 512 pixels, converted to RGB color format, and saved in JPG format for compatibility with machine learning models. 8. Data augmentation techniques including rotation, horizontal and vertical flipping, brightness and contrast adjustment, Gaussian noise addition, and sharpening were applied to increase dataset diversity. 9. The dataset was organized into two directories: Original Images and Processed Images. 10. A metadata CSV file was created containing plant species names, leaf condition labels, image counts, and data collection locations for easier dataset management and reproducibility.

Institutions

Categories

Computer Vision, Image Processing, Machine Learning, Data Acquisition, Data Analysis, Deep Learning, Agriculture

Licence