Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

Name: Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language
Creator: Fatema Tuj Johora Faria
Published: 2024-01-15T16:00:15.308Z
Keywords: Natural Language Processing, Machine Translation, Dialect, Bangladesh, Deep Learning

Faria, Fatema Tuj Johora; Bin Moin, Mukaffi; Al Wase, Ahmed; Sani, Md. Rabius; Ahmmed, Mehidi; Muhammad, Tashreef

doi:10.17632/bj5jgk878b.2

Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

Published: 15 January 2024| Version 2 | DOI: 10.17632/bj5jgk878b.2

Contributors:

,

, Md. Rabius Sani, Mehidi Ahmmed,

Description

The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for training, testing, and validation samples. The dataset details are as follows: Specifics of the Core Data: —------------------------------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) English: Train 1875, Test 375, Validation 250 (Total 2500) Specifics of the Regional Data: —-------------------------------------- Chittagong: —------------ Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Noakhali: —--------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Sylhet: —------ Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Barishal: —--------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Mymensingh: —--------------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500)

Files

Institutions

Ahsanullah University of Science and Technology

Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

Description

Files

Institutions

Categories

Related Links

Licence