BD RiceSeed: A Multi-Class Image Dataset of Bangladeshi Rice Seed Varieties for Classification Tasks

Published: 16 June 2025| Version 1 | DOI: 10.17632/khdg28v5d7.1
Contributors:
,
,

Description

The dataset was curated from rice seed samples collected across agricultural zones in Bangladesh and features eight distinct rice seed types. The dataset comprises high-resolution seed images categorized into eight varieties: 25, 28, 29, 89, 100, Chinigura, Kata Irri, and Kata Irri Vog. Each category contains 400 original images, each captured at a resolution of 1920 × 1080 pixels. To enhance the dataset quality and ensure consistency, background removal preprocessing was performed on the original images, resulting in a refined set of seed images optimized for better feature extraction. This preprocessed dataset is stored separately under the "BGRemoved_Image" directory. In order to mitigate class imbalance and strengthen model performance through diverse visual perspectives, data augmentation techniques were applied. These techniques produced 1,000 synthetic images per class, resulting in a balanced and enriched dataset totaling 8,000 images in the "Augmented_Image" directory. The dataset is organized into three primary directories: 1) Rice_Image: Contains the original, unaltered images of eight rice seed varieties. 2) BGRemoved_Image: Includes preprocessed images with background removed to improve feature clarity. 3) Augmented_Image: Holds the balanced dataset created via augmentation, used for training and evaluation purposes. This rice seed classification dataset provides a valuable resource for researchers in the fields of agricultural informatics, machine vision, and deep learning-based crop seed classification. It supports the advancement of AI-driven agricultural technologies for seed identification, classification, and quality assurance.

Files

Steps to reproduce

Data Augmentation Procedure To enhance both the quantity and diversity of the dataset, various data augmentation techniques were employed. These techniques aim to improve the generalization capability of machine learning models by introducing variability in the training data. Such transformations simulate real-world conditions, thereby making the models more robust and adaptable when encountering unseen data. Applied Augmentation Techniques Geometric Transformations: 1) Rotation: Images were randomly rotated at angles such as 15° and 30° to simulate different viewing perspectives. 2) Flipping: Both horizontal and vertical flips were applied to increase orientation variability. 3) Scaling: Images were scaled by factors such as 1.1x and 1.3x while preserving the original dimensions. 4) Shifting: Pixel-wise translation (±10 pixels) was applied along the x and y axes to mimic slight positional variations. 5) Center Cropping: Portions of the image (e.g., 60% or 80%) were cropped from the center and resized to maintain input consistency. 6) Blurring: Filters such as median blur and motion blur were applied to simulate camera focus variations and motion effects. These augmentations collectively provide a richer and more diverse training environment, enabling the model to learn invariant features and improve its adaptability to real-world scenarios. Augmented images formed the basis of the enriched dataset used for the training phase.

Institutions

Daffodil International University

Categories

Rice, Image Classification, Convolutional Neural Network, Deep Learning, Agriculture

Licence