Bengali to English Word Alignment Dataset
Published: 16 May 2022| Version 1 | DOI: 10.17632/wzgcyc643k.1
Contributors:
Md. Musfiqur Rahaman, , Description
The dataset is in XML format and contains manually annotated 2000 Bengali and English parallelly aligned sentences. These parallel sentences were collected from different news articles and encyclopedias. The translation of some of the sentences was improved via Google Translator, and all the punctuation marks from parallel sentences were removed for the tokenization issues. A sample representation of the dataset is given below: Bengali: বাংলাদেশের জলবায়ু তাপমাত্রায় মৃদু English: Climate of Bangladesh is mild in temperature Alignments: 0-1 0-2 1-0 2-5 2-6 3-3 3-4
Files
Steps to reproduce
We have developed and used a word aligner tool to create this dataset. The link to this tool is attached in the "Related Links" section.
Institutions
North South University
Categories
Computer Science, Artificial Intelligence, English Language, Natural Language Processing, Machine Translation, Machine Learning, Bengali Language, Aligner, Neural Network