Dataset of Uzbek verbs with formation and suffixes
Description
This dataset has stored information about which words verbs are derived from and with which affixes. The affixes are classified into different categories. With the help of this dataset, it is possible to determine from which part of speech (POS) each Uzbek verb is derived and with which affixes. It also plays an important role in identifying verbs in Uzbek language texts and developing rule-based models for their analysis. Additionally, this dataset plays an important role in building various artificial intelligence models for the morphological and syntactic analysis of Uzbek language texts. Verbs play a crucial role in learning any language; therefore, this dataset can also be used by students in schools and higher education institutions during the learning process. The obtained dataset serves as a valuable resource for researchers and practitioners interested in Uzbek language processing tasks. This dataset contains a total of 8,502 tagged words. This dataset contains 9 columns. Below, we provide a description of these columns: 1.VERB NAME - This column contains the infinitive form of verbs. 2.The "PREVIOUS PART OF SPEECH" column indicates the word class from which the verb is derived. The following annotations are provided in the second column, indicating the preceding part of speech of the verb. Sifat- adjective, ot-noun, fel-verb, taqlid- onomatopoeia, son-number, ravish-adverb, modal-modal 3. The "DERIVATIONAL AFFIX" column contains the word-forming suffix of the verb. 4. The "COLLABORATIVE VOICE AFFIXES" column presents the affixes indicating the collaborative (reciprocal) voice of verbs. 5. The "INTENSIVE VOICE" column contains the affixes indicating the intensive (causative) voice of verbs. 6. The "PASSIVE VOICE" column contains the affixes indicating the passive voice of verbs. 7. The "REFLEXIVE VOICE" column contains the affixes indicating the reflexive voice of verbs. 8. The eighth and ninth columns are INTENSIVE VOICE and COLLABORATIVE VOICE AFFIXES, respectively, because these affixes also appear at the end of the verb
Files
Steps to reproduce
A web corpus was crawled from internet web pages in Uzbek language. Words were extracted and then manually selected, tagged and checked.