Gurmukhi dataset

Published: 24 September 2024| Version 1 | DOI: 10.17632/h65gdk4ptv.1
Contributor:
Atul Sharma

Description

This dataset comprises a meticulously augmented collection of Gurmukhi handwritten characters, designed to enhance the performance of machine learning models in optical character recognition (OCR) and related tasks. It includes characters across 41 distinct classes, each augmented to reach a total of approximately 290 samples per class. Key Features: Gurmukhi Script Focus: The dataset exclusively features handwritten characters from the Gurmukhi script, catering specifically to applications involving Punjabi language processing. Diverse Augmentations: Images have been subjected to a range of transformations, including rotations, shifts, shears, zooms, and horizontal flips, promoting robustness to variations encountered in handwritten text. Consistent Dimensions: All images are resized to a uniform 256x256 resolution, ensuring compatibility with most deep learning architectures. Class-Specific Organization: Images are neatly organized into 41 folders, each representing a distinct Gurmukhi character, facilitating targeted training and evaluation. Handwritten Data Collection: The original images used for augmentation were collected from 10 volunteers, introducing natural variability in writing styles and further enhancing the dataset's diversity. Potential Use Cases: Gurmukhi OCR: Train and evaluate OCR models specifically for Gurmukhi script recognition. Handwriting Recognition: Develop models capable of recognizing and transcribing handwritten Gurmukhi text. Script Style Analysis: Explore the variations in handwriting styles within the Gurmukhi script.

Files

Categories

Image Acquisition

Licence