Saraiki Language Character Dataset

Published: 21 November 2023| Version 1 | DOI: 10.17632/tc9zv2wf2k.1
Muhammad Ahmad Khan


Over 26 million people speak Saraiki worldwide, with a concentration in South Punjab and a few districts in Sindh. Calligraphers write Saraiki in an extremely complex manner. For the most part, most languages in the world have highly developed optical character recognition systems; however, Saraiki does not. In order to construct a sophisticated optical character recognition system, Saraiki is still in need of researchers. This dataset is made up of over 50,000 scanned images of Saraiki language characters. These are gathered from professors and students at Pak-Austria Fachhochschule: Institute of Applied Science and Technology (PAF-IAST). The dataset is open for use in academic research.



Natural Language Processing