Bengali & Banglish: A Monolingual Dataset for Emotion Detection in Linguistically Diverse Contexts

Name: Bengali & Banglish: A Monolingual Dataset for Emotion Detection in Linguistically Diverse Contexts
Creator: Moshiur Rahman Faisal
Published: 2024-05-20T23:40:16.782Z
Keywords: Natural Language Processing, Machine Translation, Bengali Language, Emotion

Faisal, Moshiur Rahman; Ashrin Mobashira Shifa, Ashrin Mobashira Shifa; Rahman, Md Hasibur; Uddin, Mohammed  Arif; Rahman, Rashedur M

doi:10.17632/4dnrwbxt8n.2

Bengali & Banglish: A Monolingual Dataset for Emotion Detection in Linguistically Diverse Contexts

Published: 20 May 2024| Version 2 | DOI: 10.17632/4dnrwbxt8n.2

Contributors:

Moshiur Rahman Faisal, Ashrin Mobashira Shifa Ashrin Mobashira Shifa, Md Hasibur Rahman, Mohammed Arif Uddin, Rashedur M Rahman

Description

This dataset, positioned at the intersection of Bengali and Banglish (an English-character variant of Bengali), is a valuable resource for emotion detection. It encompasses a total of 80,098 data entries, comprising both languages. The dataset is organized into six distinct emotional categories: anger (15,179), disgust (13,098), fear (7,565), joy (17,836), sadness (16,309), and surprise (10,107), aligning with Ekman's six basic emotions framework. Sourced from platforms such as EmoNoBa, UBMEC, MONOVAB, and comments from YouTube and Twitter posts, it offers a diverse and rich dataset for research and analysis. Moreover, given its bilingual nature, this data also holds relevance for neural machine translation tasks.

Files

Steps to reproduce

The dataset was compiled from EmoNoBa, UBMEC, and MONOVAB and enriched with YouTube and Twitter comments across eight Bangladesh-specific domains via official APIs. The original datasets were annotated for YouTube and Twitter data through majority voting, while the original datasets were pre-annotated. After annotation and duplicate removal, the dataset was translated into Banglish, an English-character variation of Bengali.

Institutions

North South University

Bengali & Banglish: A Monolingual Dataset for Emotion Detection in Linguistically Diverse Contexts

Description

Files

Steps to reproduce

Institutions

Categories

Licence