Gurmukhi dataset

Name: Gurmukhi dataset
Creator: Atul Sharma
Published: 2024-09-24T08:58:18.631Z
Keywords: Image Acquisition

Sharma, Atul

doi:10.17632/h65gdk4ptv.1

Gurmukhi dataset

Published: 24 September 2024| Version 1 | DOI: 10.17632/h65gdk4ptv.1

Contributor:

Atul Sharma

Description

This dataset comprises a meticulously augmented collection of Gurmukhi handwritten characters, designed to enhance the performance of machine learning models in optical character recognition (OCR) and related tasks. It includes characters across 41 distinct classes, each augmented to reach a total of approximately 290 samples per class. Key Features: Gurmukhi Script Focus: The dataset exclusively features handwritten characters from the Gurmukhi script, catering specifically to applications involving Punjabi language processing. Diverse Augmentations: Images have been subjected to a range of transformations, including rotations, shifts, shears, zooms, and horizontal flips, promoting robustness to variations encountered in handwritten text. Consistent Dimensions: All images are resized to a uniform 256x256 resolution, ensuring compatibility with most deep learning architectures. Class-Specific Organization: Images are neatly organized into 41 folders, each representing a distinct Gurmukhi character, facilitating targeted training and evaluation. Handwritten Data Collection: The original images used for augmentation were collected from 10 volunteers, introducing natural variability in writing styles and further enhancing the dataset's diversity. Potential Use Cases: Gurmukhi OCR: Train and evaluate OCR models specifically for Gurmukhi script recognition. Handwriting Recognition: Develop models capable of recognizing and transcribing handwritten Gurmukhi text. Script Style Analysis: Explore the variations in handwriting styles within the Gurmukhi script.

Gurmukhi dataset

Description

Files

Categories

Licence