Kurdish Standard Characters EMNIST-like Dataset

Published: 13 July 2023| Version 1 | DOI: 10.17632/d2j939k88t.1
Hamsa D Majeed, Goran Saman Nariman, Renas Sardar, Bawar Bilal


The Kurdish Character dataset is a comprehensive collection of the Kurdish alphabet, providing detailed information about each letter used in the Kurdish writing system. This dataset serves as a valuable resource for learning and studying the Kurdish language, as well as for developing language processing applications and educational tools. The dataset includes all letters of the Kurdish alphabet, encompassing both the Arabic-based Sorani script and the Latin-based Kurmanji script. The collection mostly consists of (58) characters and was created with the assistance of about (3500) native people. This work is an enormous collection of individual handwritten Central Kurdish character representations. totally with 7000 photos for each character, it has (406000) total images. Preprocessing and subsequent preparation of the dataset used in this work were done in accordance with EMINST standards. the dataset aims to capture the natural handwriting styles and variations that exist in the Kurdish language. This allows for a more authentic representation of the Kurdish alphabet, considering the different ways individuals may write each letter. The collected dataset can be used for various purposes such as developing handwriting recognition systems, analyzing handwriting patterns, improving the accuracy of optical character recognition (OCR) algorithms, and enhancing the overall understanding and study of the Kurdish script.



University of Human Development


Optical Character Recognition