Handwritten Hindko Language Dataset (HHLD)

Published: 11 September 2024| Version 2 | DOI: 10.17632/gz8r3spkns.2
Contributors:
,
,
,

Description

Dataset consists of 100 Hindko numbers written in words from 1 to 100 so these words were written on pages and every candidate was asked to write these 100 words twice. So 200 samples are taken from every candidate. Every candidate signed an undertaking that he/she have no objection on usage of this writing for academic and research purposes. Then by using advanced scanning machines these pages were scanned by setting dpi on 1200. Then words are cropped by using cropping tool from these scan images and saved into the folders. For every class separate folder is created and labelled from 1 to 100. Every sample is saved into their relevant folder so that 100 folder is used for 100 different words. As size of every image was different so for better results every image is resized into same size that is 50x50 pixels. The dataset consists of 224782 samples. The storage size of image dataset is 394MB and storage size of CSV version of dataset is 1098MB.

Files

Institutions

Pak-Austria Fachhochschule Institute of Applied Sciences and Technology

Categories

Computer Vision, Optical Character Recognition, Natural Language Processing, Machine Learning

Licence