Kashmiri Medicinal Plant and Leaf Dataset

Published: 21 October 2025| Version 1 | DOI: 10.17632/ck4rfmrdym.1
Contributor:

Description

A dual-view dataset of 41 native Kashmiri medicinal plant species (full plant and leaf images) for AI-based classification and ethnobotanical research. All images were manually captured from real-world environments, including botanical gardens, herbal farms, and field stations maintained by the Sher-e-Kashmir University of Agricultural Sciences and Technology (SKUAST-Kashmir), Wadoora Sopore, as well as natural habitats across regions such as Kupwara . Each photograph was taken using a high-resolution Samsung Galaxy S23 camera under natural daylight and varied background conditions, ensuring that the dataset reflects real-world scenarios where lighting, texture, and surroundings differ. To ensure scientific authenticity and labeling accuracy, each plant specimen was carefully identified and verified by local botanical experts prior to inclusion. This rigorous collection and validation process enhances the dataset’s reliability and makes it a valuable resource for AI model training, research, and educational applications in the field of medicinal plant recognition.

Files

Steps to reproduce

The dataset was developed through a systematic process of field collection, expert validation, and digital preprocessing. Images of 41 native Kashmiri medicinal plant species were manually captured from real-world environments, including botanical gardens, herbal farms, and field stations maintained by the Sher-e-Kashmir University of Agricultural Sciences and Technology (SKUAST-Kashmir), Wadoora Sopore, as well as natural habitats across regions such as Kupwara . Each specimen was accurately identified and verified by botanical experts at SKUAST-Kashmir before inclusion in the dataset. All photographs were taken using a high-resolution Samsung Galaxy S23 smartphone camera under natural daylight conditions and varied background environments, ensuring realistic and diverse visual data for AI-based plant recognition tasks. To maintain consistency and enhance model readiness, the images were standardized and preprocessed. Each species was organized into two complementary categories — full-plant images showing overall structure, and leaf-level images capturing venation, texture, and shape details. All images were resized to 299×299 pixels and normalized to a [0,1] pixel range for deep learning compatibility. Data augmentation techniques, including random rotations, zooming, flipping, and brightness adjustments, were applied using TensorFlow’s ImageDataGenerator to improve robustness and prevent overfitting. The preprocessing and augmentation were performed using Python (TensorFlow and Keras) on Google Colab with GPU acceleration. The final curated dataset consists of approximately 10,600 images evenly distributed across 41 species, making it suitable for reproducible research in image classification, transfer learning, and ethnobotanical AI applications.

Institutions

  • Central University of Kashmir
  • Sher-E-Kashmir University of Agricultural Sciences and Technology Kashmir

Categories

Artificial Intelligence, Data Science, Machine Learning, Image Classification Techniques, Deep Learning

Licence