Preprocessed Brinjal Leaf Image Dataset for Healthy and Diseased Leaf Detection (Phomopsis Blight and Little Leaf) Using Deep Learning
Description
This dataset contains preprocessed images of Solanum melongena (Brinjal) leaves collected under real agricultural field conditions for automated plant disease detection using computer vision and deep learning techniques. The dataset is divided into two primary categories: 1) Healthy_Leaves 2) Unhealthy_Leaves The Unhealthy_Leaves category is further subdivided into two disease classes: • Phomopsis Blight (Phomopsis vexans) • Little Leaf (Brinjal Leaf Curl) All images were manually cropped to retain only the leaf region of interest (ROI) to minimize background noise and enhance disease feature visibility. The cropped images were resized to a uniform spatial resolution of 224×224 pixels to ensure dimensional consistency across the dataset. All images are stored in JPG (.jpg) format to maintain uniformity. Any PNG files present during dataset restructuring were converted to JPG to ensure consistent file format across all classes. Each image was processed using 16 preprocessing and augmentation techniques to enhance structural, color, and texture-based disease patterns. These include contrast enhancement, filtering, morphological transformations, geometric augmentation, and texture-based filtering. The images were captured under natural field lighting conditions using standard smartphone cameras during routine agricultural observation. Leaves exhibiting visible symptoms of disease and healthy samples were manually collected and photographed directly on plants to preserve natural texture and color variations present in real cultivation environments. This dataset was created to support research in automated plant disease diagnosis using deep learning and computer vision techniques. The preprocessing variants were generated to highlight different visual characteristics of leaf images, including color distribution, venation structures, lesion boundaries, and texture irregularities. Each original image is accompanied by 16 preprocessing variants: V1 Original (Resized) V2 Grayscale V3 CLAHE V4 Gamma Correction V5 HSV Adjustment V6 Brightness Enhancement V7 Contrast Enhancement V8 Sharpening V9 Gaussian Blur V10 Median Blur V11 Bilateral Filtering V12 Morphological Top-Hat V13 Morphological Black-Hat V14 Rotation (+25°) V15 Horizontal Flip V16 Unsharp Masking Each processed image follows a structured naming convention: ClassName_SerialNumber_PreprocessingType.jpg This naming format ensures reproducibility, traceability, and structured experimentation for machine learning and deep learning workflows. The dataset is suitable for: • Binary classification (Healthy vs Diseased) • Multi-class disease classification • CNN benchmarking and transfer learning • Precision agriculture research • AI-based crop health monitoring systems Researchers may use the dataset for training and evaluating machine learning models, benchmarking preprocessing pipelines, or developing automated crop monitoring systems for precision agriculture applications.
Files
Steps to reproduce
🔬 STEPS TO REPRODUCE 1) Download and extract the dataset archive. 2) The dataset folder structure is organized as follows: Brinjal_Dataset/ ├── Healthy_Leaves/ │ ├── Healthy_Leaves_001_Original.jpg │ ├── Healthy_Leaves_001_CLAHE.jpg │ ├── Healthy_Leaves_001_Gamma.jpg │ └── … │ └── Unhealthy_Leaves/ ├── Phomopsis_Blight/ │ ├── Phomopsis_Blight_001_Original.jpg │ ├── Phomopsis_Blight_001_CLAHE.jpg │ └── … │ └── Little_Leaf/ ├── Little_Leaf_001_Original.jpg ├── Little_Leaf_001_CLAHE.jpg └── … 3) All cropped images were resized to a fixed resolution of 224 × 224 pixels to maintain dimensional consistency required for convolutional neural network architectures and were stored in JPG (.jpg) format. 4) The preprocessing pipeline was implemented using Python with OpenCV, NumPy, and PIL libraries. Each transformation was applied programmatically to generate the 16 variants associated with every original image. 5) The file naming convention follows: ClassName_SerialNumber_PreprocessingType.jpg This ensures reproducibility and traceability between original and processed variants. 6) For binary classification tasks: • Use Healthy_Leaves as the healthy class • Use Unhealthy_Leaves as the diseased class 7) For multi-class classification: • Use Phomopsis_Blight and Little_Leaf as separate disease labels. 8) Split the dataset into training, validation, and testing sets (e.g., 75–15–15). 9) Normalize pixel values to the 0–1 range before feeding them into deep learning models. 10) Train models using frameworks such as TensorFlow, Keras, or PyTorch, and evaluate performance using Accuracy, Precision, Recall, F1-score, and Confusion Matrix. 11) The dataset structure supports both binary classification and multi-class disease classification workflows commonly used in deep learning experiments.
Institutions
- Birla Institute of Technology, MesraJharkhand, Ranchi