Indigenous Dataset for Apple Leaf Disease Detection and Classification

Published: 12 January 2024| Version 3 | DOI: 10.17632/9m2dcb5mmr.3


This dataset, suitable for training Machine Learning and Deep Learning models, has been built by collecting images from the apple cultivation fields of Jammu and Kashmir. The dataset contains 7569 images belonging to three categories viz Healthy, Alternaria and Apple-Mosaic.


Steps to reproduce

Images were collected from real time fields with the help of handheld devices/cameras. The collected dataset was subjected to the data cleaning process and annotation. The images, once collected, were subjected to manual scrutiny to examine the region of interest (ROI) by the experts. On minute examination, it was found that some images had mechanical damage resembling Alternaria's symptoms. All such images were removed from the data set. The images were classified into three classes: healthy, Apple-Mosaic and Alternaria. Moreover, Image preprocessing techniques have been applied to enhance the data quality. A common problem with deep learning models is over-fitting. Over-fitting means when a model perfectly fits the training data but fails to generalize and performs badly on the unseen data. It often happens when sufficient training data is not available. This is due to the fact that an over-fitted model finds it challenging to handle data from the testing set that may differ from the training set. In contrast to learning the features hidden in the data, over-fitted models have a tendency to remember all the data, including inherent noise in the training set. Different processing methods and techniques have been leveraged to enlarge the data set and infuse diversity in the samples to avoid such problems. Using ImageDataGenerator class of the Keras library the transformations like Translation, Rotation, Shearing, Image flipping, Zooming and Resizing were applied to create new samples.


Computer Vision, Machine Learning, Applied Computer Science, Deep Learning