Urdu Handwritten Text Dataset

Published: 9 August 2021| Version 1 | DOI: 10.17632/bg2sctsysf.1
Mujtaba Husnain


The dataset contains the images of handwritten text in Urdu language, one of the most widely spoken languages in South-East Asian regions. The native-speaking authors from different social domains were invited to write a pre-written text in their handwritings. The pre-written text is carefully written in a way that it includes almost all the characters, ligatures, diacritics, and dots used in writing the text Urdu script. The disabled persons are also involved to write the text to make the data collection more comprehensive. The demographic data of the authors is also recorded for supporting the research activities like author identification, text-matching etc.


Steps to reproduce

This dataset can be augmented by inviting as many authors of native Urdu speakers to write the pre-written text in their handwritings.


Applied Sciences