Dataset of Khorezm dialect words of Uzbek language (Words extracted from books)

Published: 23 May 2025| Version 1 | DOI: 10.17632/c9d2k9dw5x.1
Contributor:
Davlatyor Mengliev

Description

As part of the study, a dataset was formed, which was used by a rule-oriented algorithm to standardize dialect forms into formal equivalents. In particular, the dataset contains 1340 dialect words: 1) The words in this dataset were compiled thanks to the joint work of expert linguists who are well versed not only in the Uzbek (formal) language, but also in the dialect forms of this language. 2) The sources of words in the dataset was a book, which was written by F. Abdullaev in 1965, published by the A.S. Pushkin Institute of Language and Literature of the Academy of Sciences of the Uzbek SSR.. The dataset was formed manually, no automation processes were carried out except for cases of transliteration of Cyrillic into Latin.

Files

Categories

Natural Language Processing, Database Languages, Uzbekistan

Licence