Strawberry Disease Detection Dataset with Hybrid Data Augmentation

Published: 23 September 2025| Version 1 | DOI: 10.17632/k2ptxmfjhj.1
Contributors:
,
,
,

Description

This dataset provides annotated images for instance segmentation of strawberry diseases affecting both fruit and leaves. It extends two publicly available datasets: [1] Afzaal, U., Bhattarai, B., Pandeya, Y.R., Lee, J., 2021. An Instance Segmentation Model for Strawberry Diseases Based on Mask R-CNN. Sensors 21, 6565. https://doi.org/10.3390/s21196565. [2] Pérez-Borrero, I., Marín-Santos, D., Gegúndez-Arias, M.E., Cortés-Ancos, E., 2020. A fast and accurate deep learning method for strawberry instance segmentation. Comput. Electron. Agric. 178, 105736. https://doi.org/10.1016/j.compag.2020.105736. Both datasets are distributed under terms that allow reuse for academic and research purposes. This extended dataset (v1.0) applies systematic augmentation for class balance and increases the diversity of environmental conditions in the data. The proposed dataset contains 5,610 annotated images across 8 classes, with approximately 700 images per class: Angular Leafspot, Anthracnose Fruit Rot, Blossom Blight, Gray Mold, Healthy Strawberry, Leaf Spot, Powdery Mildew Fruit, Powdery Mildew Leaf. All images are provided in JPG format with a resolution of 640 × 640 pixels. Annotations are polygon-based instance masks provided in COCO segmentation format (.json). The proposed dataset is organized into 80% training, 10% validation, and 10% testing, ensuring balanced representativeness of all classes. Two directory structures are included at the root level: /Strawberry-segmentation-hybrid-augmentation /train image_1.jpg image_2.jpg ... _annotations.coco.json /valid image_1.jpg image_2.jpg ... _annotations.coco.json /test image_1.jpg image_2.jpg ... _annotations.coco.json /Strawberry-segmentation-traditional-augmentation /train image_1.jpg image_2.jpg ... _annotations.coco.json /valid image_1.jpg image_2.jpg ... _annotations.coco.json /test image_1.jpg image_2.jpg ... _annotations.coco.json Traditional image augmentation includes geometric and photometric transformations, including random cropping, center cropping, rotation, horizontal flipping, vertical flipping, random brightness and contrast adjustments, and random hue and saturation value adjustments. Hybrid image augmentation additionally incorporates synthetic images generated with a diffusion model (DALL·E 3). Synthetic images are included only in the training split.

Files

Steps to reproduce

To ensure reproducibility, two augmentation phases were done: 1 Traditional Augmentation Standard transformations were used to rebalance classes, yielding 1,559 additional images. 2 Hybrid Augmentation with Synthetic Images An additional 1,018 synthetic images were generated using the DALL·E 3 diffusion model, designed to maximize diversity in composition, perspective, lighting, and occlusion. Relevant characteristics of the generation process included: Varied camera perspectives: top-down, side views, oblique angles, and close-ups of leaf and fruit surfaces. Background variation: agricultural contexts such as soil, mulch, greenhouse benches, foliage, and hands holding fruit. Occlusion diversity: berries partially covered by stems, leaves, tools, or other fruits. Lighting variability: overexposed (bright sunlight), underexposed (shadowed or low-light), and natural conditions simulating environmental constant varying light conditions. Fruit and leaf color and ripeness variation to ensure unique compositional features across images. These synthetic samples were manually reviewed and integrated exclusively in the training set, enhancing dataset balance while avoiding contamination of validation and testing subsets. Annotations for all diffusion-based generated images were manually created and quality-checked with the Roboflow platform, ensuring high-quality segmentation masks. The final dataset is distributed as follows. Angular Leafspot: 702 Anthracnose Fruit Rot: 700 Blossom Blight: 702 Gray Mold: 702 Healthy Strawberry: 700 Leaf Spot: 700 Powdery Mildew Fruit: 702 Powdery Mildew Leaf: 702 Totals: Base images: 2,459 Traditional augmentation: 1,559 Synthetic augmentation: 1,018 Grand total: 5,610 images The dataset is designed for instance segmentation tasks. Standard evaluation metrics include: COCO mAP@[0.5:0.95] and mAP@0.5 for overall performance. Per-class AP to assess disease-specific detection quality. Precision–Recall curves for detailed analysis.

Institutions

  • Universidad Autonoma de Baja California - Campus Ensenada

Categories

Agricultural Science, Horticulture, Artificial Intelligence, Computer Vision, Plant Pathology, Crop Protection, Strawberry, Deep Learning, Instance Segmentation

Funders

Licence