Bahasa Madura Corpus Dataset

Published: 30 October 2023| Version 3 | DOI: 10.17632/cgtg4bhrtf.3
Contributors:
,
,

Description

This bilingual corpus is built and organized based on the Bible Translation from Indonesian to Madurese. This corpus contains more than 30,000 sentences from Indonesian - Madurese. The data is collected in text form with .xls and txt formats, using tab separators. We have also supplemented this dataset with the English translation of the Bible.

Files

Steps to reproduce

This corpus consists of more than 20,000 Indonesian - Madurese sentences, and we divide it by chapters in the new translation of the Bible, from Genesis to Revelation. This dataset format we use .xls and .txt with a separator using Tab

Institutions

Universitas Negeri Malang

Categories

Natural Language Processing, Corpus Linguistics, Indonesian Language

Licence