Downy Mildew and Powdery Mildew Symptoms
Description
This dataset includes annotated images showing symptoms of powdery mildew and downy mildew, collected over a two-year period and organized into two subsets: "Dataset 2021" and "Dataset 2022". Dataset 2021 contains 753 images organized into 8 folders: 489 of powdery mildew and 264 of downy mildew. Each image has varying numbers of annotations classified into two categories: "powdery mildew" and "downy mildew". The annotation totals are: • Powdery mildew annotations: 1,696 • Downy mildew annotations: 657 Dataset 2022 consists of 1,404 images organized into 13 folders.The annotations are also classified into "powdery mildew" and "downy mildew", with totals as follows: • Powdery mildew annotations: 3,915 • Downy mildew annotations: 20,959 All images were collected in Sardinia, Italy. Both datasets are provided as directories containing zipped image files and corresponding zipped annotation files. The annotation files are in text format, with each row describing one annotated leaf. The format is: class_id center_x center_y width height • class_id: An integer representing the class. • center_x, center_y, width, height: Floating-point values normalized to the image dimensions, indicating the bounding box center and size. In the "file_names" directory, the file lists for Dataset 2021 and Dataset 2022 are organized differently. Regarding Dataset 2021 al the file names are listed in "dataset_2021.txt". Regarding Dataset 2022 two sets of text files correspond to two different splitting methods: • Structured Split: Files are listed in "train_dataset_2022.txt", "val_dataset_2022.txt", and "test_dataset_2022.txt". • Random Split: Files are listed in "train_rand_2022.txt", "val_rand_2022.txt", and "test_rand_2022.txt". The "Structured Split" division was based on file creation dates to minimize overlap between sets, with the training set being more than twice the size of the validation and test sets. In the "Random Split" case similar images of same leaves, captured from different angles or distances, are distributed across sets. While this random split may artificially inflate performance on familiar data due to overfitting, it results in reduced performance on unseen data, such as images listed in "dataset_2021.txt".
Files
Steps to reproduce
Using the YOLOv5 framework, we conducted five experiments, all of which can be fully reproduced with the provided datasets: Experiment 1: train and test a model using the "Structured Split" of Dataset 2022. Experiment 2: evaluate the "Structured Split" model trained in Experiment 1 on Dataset 2021 (unseen data). Experiment 3: train and test a model using the "Random Split" of Dataset 2022. Experiment 4: evaluate the "Random Split" model trained in Experiment 3 on Dataset 2021 (unseen data). Experiment 5: train and test a model using the "Structured Split" of Dataset 2022, with Dataset 2021 added to the training set to increase the diversity of plants used during training. Results could be compared with those in Experiment 1.