Madurese Sentence Dataset

Published: 8 May 2023| Version 3 | DOI: 10.17632/z7kxfzzc8g.3
danang arbian,


This corpus was built manually based on the book Madurese Language, Madurese Language Syntax, and Madurese Language Morphology and Syntax issued by the Indonesian Ministry of Education and Culture, and the result is a corpus containing more than 640 sentences in Maduranese and Indonesian. The data collected is in the form of text and in .txt format, with a tab separator to separate Indonesian from Madura.


Steps to reproduce

Corpus sentences in Madura and Indonesian are more than 650 sentences, which are separated by tab separators, and for corpus per word in Madura and Indonesian, there are more than 600 words consisting of a mixture of basic words in Madura and also those that have received affixes or suffixes so as to clarify the meaning of the word from the Madurese language


Universitas Negeri Malang


Computer Science, Natural Language Processing, Corpus Linguistics