FeniVerse: A Parallel Corpus of Feni Dialect, Standard Bengali, and English
Description
FeniVerse is an openly accessible trilingual parallel corpus containing 4,094 aligned sentences each in English, Standard Bangla, and the Feni dialect (12,282 total sentences). Each entry is sentence-aligned across the three languages, enabling machine translation, dialect classification, cross-linguistic analysis, and other NLP research. The dataset is provided as a ZIP file named “FeniVerse Parallel Corpus”, which contains two main files: FeniVerse_Dataset.csv – with three columns: English, Standard Bangla, Feni Dialect FeniVerse_Dataset.xlsx – with the same three columns This is the first publicly available dataset for the Feni dialect, offering an authentic, manually curated, and sentence-aligned resource for linguistic and computational experiments.
Files
Institutions
- Daffodil International University