Bahasa Madura Corpus Dataset
Published: 30 October 2023| Version 3 | DOI: 10.17632/cgtg4bhrtf.3
Contributors:
, , Description
This bilingual corpus is built and organized based on the Bible Translation from Indonesian to Madurese. This corpus contains more than 30,000 sentences from Indonesian - Madurese. The data is collected in text form with .xls and txt formats, using tab separators. We have also supplemented this dataset with the English translation of the Bible.
Files
Steps to reproduce
This corpus consists of more than 20,000 Indonesian - Madurese sentences, and we divide it by chapters in the new translation of the Bible, from Genesis to Revelation. This dataset format we use .xls and .txt with a separator using Tab
Institutions
Universitas Negeri Malang
Categories
Natural Language Processing, Corpus Linguistics, Indonesian Language