A dataset of fortunella margarita images

Published: 17-05-2021| Version 1 | DOI: 10.17632/wnv4bszczz.1
Yi-Shun Wu,
Mei Ling Huang


This article created a Fortunella margarita image data set. The images of fortunella margarita are divided into seven categories: (a) mature, (b) immature, (C) growing, (d) containing mature and immature in one image, (e) mature and growing, (f) immature and growing, and (g) image data at all three stages set. Then the dataset is sorted into two types of files: (1) Growth stage classification images (Growth stage classification) and (2) Manual labeling and annotation files (Labels). (1) Growth stage classification images: The number original images is 1031. Figure 1 shows examples of the original images for seven fortunella margarita categories. After data augmentation, we have a total of 6611 images. The size of image is 3024 * 4032 pixels, and the image format is JPEG. The dataset is divided into training, verification and test sets for machine learning or deep learning. Table 1 shows the number of training, verification and test images of the seven categories before data amplification. (2) Manual labeling and annotation files (Labels): This dataset contains 6611 annotation images. The file format is XML. Each file is manually labeled with the growth stage and location as shown in Figure 2. This file can be used by researchers to train deep learning models such as YoLo, R-CNN etc.


Steps to reproduce

Data processing is divided into four steps: image acquisition, image preprocessing, image expansion and manual image labeling as follows. 1. Image acquisition The images were taken with iPhone 11 Pro in Jiaoxi, Yilan County, Taiwan. The weather is clear and cloudy, and the background changes are under consideration to take multi-angle photos with a distance of 100-200 mm from the targets. The file size is 3024 * 4032 pixels, and the format is JPEG. We have a total of 1031 original images. 2. Image preprocessing The experts evaluate and divide the fortunella margarita images into seven categories according to growth stages, and the number of each categories are as follows: (a) 399 mature, (b) 168 young, (c) 64 growing, (d) 70 mature and young in one photo, (e) 205 mature and growing, (f) 31 young and growing, and (g) 94 in three stages. The images were divided into training, validation and test sets with the ratio of 70:20:10. 3. Image augmentation In order to improve the size and quality of the training data set, we use data augmentation methods [5] for the training and verification images of the original images. Methods include Gaussian filtering, image brightness increase, image brightness reduction, mirror rotation, noise increase, and 180° rotation. The total number of images is 6611 after data augmentation. Figure 3 shows an example of the original image and the images after data augmentation. Data augmentation not only increases the number of images, but also prevents overfitting of the training model. Table 2 presents the number of training, verification and test images before and after data amplification. 4. Manual image labeling Each image needs to be labeled and classified before training the YOLO or R-CNN models. We use the data labeling l[6] to manually label the targets and generate 6611 annotation files in XML format.