BrinjalFruitX: A Field-Collected Image Dataset for Machine Learning and Deep Learning-Based Disease Identification in Brinjal Fruits
Description
This dataset comprises a total of 1,823 high-quality images of brinjal (eggplant) fruits, annotated across five distinct classes representing both healthy and disease-infected conditions. The dataset was curated with a strong emphasis on real-world variability, agricultural relevance, and class diversity to support robust machine learning (ML), deep learning (DL), and computer vision (CV) applications in plant disease detection. Class Distribution: The five categories and their respective image counts are as follows: 1. Shoot and Fruit Borer: 725 images 2. Healthy: 514 images 3. Wet Rot: 223 images 4. Brinjal Fruit Cracking: 200 images 5. Phomopsis Blight: 161 images The class distribution reflects natural occurrences in the field, thereby introducing a realistic imbalance that is often encountered in agricultural datasets. This also provides an opportunity to test model robustness under skewed class conditions. Data Collection Methodology To ensure data quality and diversity, all images were collected manually from actual agricultural fields during extensive field visits across two major brinjal-producing regions in Bangladesh: Bogura and Dhaka. These regions were selected due to their prominence in vegetable farming and accessibility to disease-affected brinjal crops. Images were captured using a standard smartphone camera under natural lighting conditions, without artificial augmentation or pre-processing during acquisition. A total of 2,273 raw images were initially collected from the field, and after rigorous quality filtering, labeling, and expert validation, 1,823 images were finalized for the dataset. Characteristics: - Image Format: JPEG - Image Size: Varied; later resized uniformly for model input (typically 128×128 or 224×224 pixels) - Capture Device: Smartphone cameras - Environment: Natural daylight, outdoor agricultural settings - Data Source: Real field samples from farmers' brinjal crops - Annotation: Manual expert labeling under the guidance of agricultural specialists Significance and Applications This dataset provides a realistic, high-variance visual resource for developing and testing machine learning and deep learning models for plant disease recognition. Its real-world origin ensures relevance in practical agricultural applications, while its diversity and class-specific challenges make it suitable for experimentation in: 1. Supervised classification tasks 2. Transfer learning and fine-tuning of vision models 3. Imbalanced learning techniques (e.g., oversampling, undersampling) 4. Explainable AI (XAI) and model interpretation (e.g., Grad-CAM) 5. Lightweight model deployment for mobile-based disease detection tools
Files
Institutions
- Daffodil International University