Kurdish Kurmanji Dialect Dataset for Kurmanji Lemmatization and Spell-Checker with Spell-Correction

Published: 6 December 2022| Version 1 | DOI: 10.17632/s9wyvvbj9j.1
Hanar Hoshyar Mustafa, Rebwar Nabi


A new dataset for Kurdish Language lemmatizer and spell correction has been created. The data set was compiled by reading books and articles written in Kurmanji Kurdish dialect, which were then recorded and manually entered into the dataset. The dataset contains Kurmanji Kurdish dialect words such as verbs, nouns, conjunctions, stop words, pronouns, imperative words, superlative words, question words, and so on. The dataset contains approximately 1200 words, divided into 587 nouns, 141 verbs, and 463 other related words. Last but not least, this can be regarded the first dataset for Kurmanji dialect in Kurdish Language.



Grammar, Kurd, Language