Dataset containing individual characters of the Devanagari Script including Vowels and Consonants.

Published: 16 April 2024| Version 1 | DOI: 10.17632/dkx9xmp6jm.1
Contributors:
Kaustubh Bajpai,
,
,

Description

This dataset contains 8271 images of Hindi handwritten characters that are resized to a size of 32x32 pixels and gray scaled. It contains 2,314 samples of vowels and 5,957 samples of consonants respectively. The given dataset is divided into two parts of training and testing dataset.

Files

Steps to reproduce

A methodical approach was taken to carefully gather the dataset for this study, starting with the subjects' selection from the MIET College, Meerut campus. A total of eight people, who represented a varied sample pool, were selected at random. On a single white A4-sized sheet of paper, each participant was asked to write all 49 Hindi characters as a sample of their handwriting. This guaranteed stylistic diversity in handwriting, enabling a large-scale dataset representative of actual circumstances. After the handwriting samples were gathered, close-ups of every single character were taken. Initially, a significant number of photos were clicked for every character in order to guarantee the dataset's robustness and variability. In particular, 200 pictures were taken for every character in the Devanagari Script, for a total of 9800 pictures covering all 49 characters. After that, these photos were carefully arranged into 49 different classes, each of which represented a different character. This classification made it easier to process the data and train the model later on. In addition to guaranteeing representativeness and variety, the method of gathering datasets took into consideration possible differences in handwriting styles and personal quirks. The goal of this all-encompassing strategy was to produce a dataset that faithfully captures real-world situations, hence improving the effectiveness and generalizability of the ensuing deep learning models.

Institutions

Meerut Institute of Engineering and Technology

Categories

Optical Character Recognition, Machine Learning, Hindi Language, Deep Learning

Licence