Madurese Sentence Dataset

Published: 8 May 2023| Version 3 | DOI: 10.17632/z7kxfzzc8g.3
Contributors:
danang arbian,

Description

This corpus was built manually based on the book Madurese Language, Madurese Language Syntax, and Madurese Language Morphology and Syntax issued by the Indonesian Ministry of Education and Culture, and the result is a corpus containing more than 640 sentences in Maduranese and Indonesian. The data collected is in the form of text and in .txt format, with a tab separator to separate Indonesian from Madura.

Files

Steps to reproduce

Corpus sentences in Madura and Indonesian are more than 650 sentences, which are separated by tab separators, and for corpus per word in Madura and Indonesian, there are more than 600 words consisting of a mixture of basic words in Madura and also those that have received affixes or suffixes so as to clarify the meaning of the word from the Madurese language

Institutions

Universitas Negeri Malang

Categories

Computer Science, Natural Language Processing, Corpus Linguistics

Licence