Chinese medicinal blossom-dataset

Published: 27 August 2021| Version 2 | DOI: 10.17632/r3z6vp396m.2


The blossom images of traditional Chinese medicinal herbs were captured by Google search. The images were divided into twelve categories: (1) Syringa, (2) Bombax malabarica, (3) Michelia alba, (4) Armeniaca mume, (5) Albizia julibrissin, (6) Pinus massoniana, (7) Eriobotrya japonica, (8) Styphnolobium japonicum, (9) Prunus persica, (10) Firmiana simplex, (11) Ficus religiosa and (12) Areca catechu. The number of original images is 1716, and the total number of images acquired after data augmentation was 12538. The dataset provide a collection of blossom images on traditional Chinese herbs help Chinese pharmacist to classify the categories of Chinese herbs. In addition, the dataset can serve as a resource for researchers who use different algorithms of machine learning or deep learning for image segmentation and image classification.


Steps to reproduce

 Step 1 Image Acquisition: Internet search for twelve traditional Chinese medicinal blossoms.  Step 2 image preprocessing: We evaluated the blossom images by cropping letters and frames, deleting handwriting and blurred images, centering the blossoms, and adjusting the length and width. The image file size is not equal, and the image format is in JPG.  Step 3 image partition: The original 1716 images were divided into training, validation, and test subsets at a 80:10:10 ratio.  Step 4 image augmentation: data augmentation methods including Gaussian filtering, image brightness augmentation, image brightness reduction, mirror rotation, noise increase, 90°rotation, and 180° rotation, to the images in the training and validation datasets.


Agricultural Science, Computer Science