Emotions Lexicons in Roman Urdu, along with Emotion Polarity and Mood categorization

Published: 13 October 2023| Version 2 | DOI: 10.17632/d5j9fgbdcn.2
Asia Siddiqui,


This dataset consists of two files: one file serves as a dictionary and provides emotion lexicons in both Urdu and English, while the second file contains annotated data. This dataset contains over 14000 multilingual emotion lexicons as well as 10,000 tweets that have been annotated with relation to sentiment, mood, and emotion. The text, which may be written fully in English, entirely in Roman Urdu, or a blend of the two languages, can be analyzed using the dataset.


Steps to reproduce

This dataset provides a comprehensive set of emotion lexicons in Roman-Urdu and English. An annotated data set has also been provided with the help of these lexicons by writing the script in Python. Urdu lexicons from the multilingual sentiment lexicon collection were used to construct a new dataset. Initially, the extracted data was corrected to remove the inaccurate translation. A Python script was made to translate Urdu into Roman Urdu after the lexicons for Urdu were updated. Afterwards, annotations are made to the dataset to differentiate between sentiments and emotions.


Sindh Maddressatul Islam University, Bahria University - Karachi Campus


Textual Analysis