Bahasa Madura Corpus Dataset

Name: Bahasa Madura Corpus Dataset
Creator: danang arbian
Published: 2023-10-30T07:25:15.755Z
Keywords: Natural Language Processing, Corpus Linguistics, Indonesian Language

arbian, danang; Wibawa, Aji Prasetya; Almu'iini Ahda, Fadhli

doi:10.17632/cgtg4bhrtf.3

Bahasa Madura Corpus Dataset

Published: 30 October 2023| Version 3 | DOI: 10.17632/cgtg4bhrtf.3

Contributors:

danang arbian, Aji Prasetya Wibawa, Fadhli Almu'iini Ahda

Description

This bilingual corpus is built and organized based on the Bible Translation from Indonesian to Madurese. This corpus contains more than 30,000 sentences from Indonesian - Madurese. The data is collected in text form with .xls and txt formats, using tab separators. We have also supplemented this dataset with the English translation of the Bible.

Files

Steps to reproduce

This corpus consists of more than 20,000 Indonesian - Madurese sentences, and we divide it by chapters in the new translation of the Bible, from Genesis to Revelation. This dataset format we use .xls and .txt with a separator using Tab

Institutions

Universitas Negeri Malang

Bahasa Madura Corpus Dataset

Description

Files

Steps to reproduce

Institutions

Categories

Licence