Processed Tomato Leaf Disease Image Dataset with Train-Validation-Test Split for Deep Learning Applications

Published: 6 April 2026| Version 1 | DOI: 10.17632/3zwdw6y4pn.1
Contributors:
Shahriar Ahmed Shovo, Samiha Raisa Mostafa

Description

This dataset is a processed and structured version of a tomato leaf disease image dataset derived from the PlantVillage dataset. The dataset has been carefully preprocessed to improve data quality and usability for deep learning research. Several data cleaning and preparation steps have been applied, including the removal of blurred images, elimination of duplicate samples, selection of specific disease classes, and organization into train, validation, and test subsets. The dataset contains six classes of tomato leaf conditions: - Tomato___Bacterial_spot - Tomato___Early_blight - Tomato___Late_blight - Tomato___Leaf_Mold - Tomato___Septoria_leaf_spot - Tomato___healthy The dataset is organized as follows: - Training set: 70% - Validation set: 15% - Test set: 15% This dataset is suitable for image classification, deep learning, transfer learning, and agricultural disease detection research. The preprocessing steps ensure improved model performance and provide a ready-to-use dataset for machine learning experiments.

Files

Steps to reproduce

1. Obtain the original PlantVillage dataset containing tomato leaf disease images. 2. Select relevant tomato leaf classes: - Tomato___Bacterial_spot - Tomato___Early_blight - Tomato___Late_blight - Tomato___Leaf_Mold - Tomato___Septoria_leaf_spot - Tomato___healthy 3. Perform data cleaning: - Remove blurred images using image sharpness filtering. - Remove duplicate images to avoid data redundancy. 4. Standardize image format and ensure all images are suitable for deep learning models. 5. Split the dataset into three subsets: - Training set (70%) - Validation set (15%) - Test set (15%) 6. Organize the dataset into directory structure: train/ val/ test/ 7. Each subset contains class-wise folders for supervised learning. 8. The dataset is now ready for training deep learning models such as CNN, MobileNetV2, and EfficientNetB0.

Categories

Computer Science, Artificial Intelligence, Machine Learning, Image Classification, Agricultural Health

Licence