Bengali to English Word Alignment Dataset

Name: Bengali to English Word Alignment Dataset
Creator: Md. Musfiqur Rahaman
Published: 2022-05-16T15:56:55.896Z
Keywords: Computer Science, Artificial Intelligence, English Language, Natural Language Processing, Machine Translation, Machine Learning, Bengali Language, Aligner, Neural Network

Rahaman, Md. Musfiqur; Haque, Md. Mominul; Islam, Fahim

doi:10.17632/wzgcyc643k.1

Bengali to English Word Alignment Dataset

Published: 16 May 2022| Version 1 | DOI: 10.17632/wzgcyc643k.1

Contributors:

Md. Musfiqur Rahaman, Md. Mominul Haque, Fahim Islam

Description

The dataset is in XML format and contains manually annotated 2000 Bengali and English parallelly aligned sentences. These parallel sentences were collected from different news articles and encyclopedias. The translation of some of the sentences was improved via Google Translator, and all the punctuation marks from parallel sentences were removed for the tokenization issues. A sample representation of the dataset is given below: Bengali: বাংলাদেশের জলবায়ু তাপমাত্রায় মৃদু English: Climate of Bangladesh is mild in temperature Alignments: 0-1 0-2 1-0 2-5 2-6 3-3 3-4

Files

Steps to reproduce

We have developed and used a word aligner tool to create this dataset. The link to this tool is attached in the "Related Links" section.

Institutions

North South University

Bengali to English Word Alignment Dataset

Description

Files

Steps to reproduce

Institutions

Categories

Related Links

Licence