Published: 26 September 2023| Version 1 | DOI: 10.17632/4gkcyxjyss.1


Purpose of dataset creation: ‘SorghumWeedDataset_Classification’ is created to address real-time weed challenges and encourage weed research using computer vision applications. About the dataset: ‘SorghumWeedDataset_Classification’ is a crop-weed research dataset with 4312 data samples, which can be used for image classification. Sorghum samplings (Class 0), Grasses (Class 1), and Broad-leaf weeds (Class 2) are the three research objects focused during this data acquisition process. This dataset contains 1404 samples of sorghum samplings, 1467 samples of grasses, and 1441 samples of broadleaved weeds. The TVT (Train: Validate: Test) ratio is set as 7:2:1 to split the data samples into training, validation, and testing. Data samples with class labels are provided herewith. Equipment used for data acquisition: To record a rich set of information on the research objects, a state-of-the-art instrument - Canon EOS 80D – a Digital Single Lens Reflex (DSLR) camera with a sensor type of 22.3mm x 14.9 mm CMOS is used. Data type, format, and size: Each data sample is an RGB image represented in JPEG format with 6000 × 4000 pixels making an average size of 13MB each. All data samples are re-sized to 224 × 224 pixels without information loss to reduce the computation complexity. Data acquisition: This dataset emphasizes the early stages of crop growth to meet the challenges faced during the ‘Critical period of weed competition’. Data samples are captured from agriculture fields that follow both uniform crop spacing and random crop spacing. Temporal coverage: Data is acquired during April and May 2023. To generalize the dataset, acquisition is done in various light and weather conditions with varying distances. Geographical coverage: Data is acquired from Sri Ramaswamy Memorial (SRM) Care Farm, Chengalpattu district, Tamil Nadu, India. To the best of our knowledge, ‘SorghumWeedDataset_Classification’ is the first open-access crop-weed research dataset from Indian fields for classification that deals with weed issues in uniform and random crop-spacing fields. Expected outcome: The expected outcome of this dataset will be an Artificial Intelligence (AI) model that predicts the correct class of a particular data sample. Detailed description: A detailed description of the dataset and data acquisition process is given in the data article entitled “ ‘SorghumWeedDataset_Classification’ And ‘SorghumWeedDataset_Segmentation’ Datasets For Classification, Detection, and Segmentation In Deep Learning “. (Submitted in the journal ‘Data in Brief’ on 25/09/2023 and awaiting publication) Citation: If you find this dataset helpful and use it in your work, kindly cite this dataset using “Michael, Justina; M, Thenmozhi (2023), “SorghumWeedDataset_Classification”, Mendeley Data, V1, doi: 10.17632/4gkcyxjyss.1” Further queries: If any queries/suggestions concerning this dataset, please e-mail us at [corresponding author]



SRM University


Artificial Intelligence, Computer Vision, Machine Learning, Data Acquisition, Sorghum, Broadleaf Weeds, Grass Weeds, Deep Learning