CUTEly MAD Dataset

Published: 29 January 2024| Version 1 | DOI: 10.17632/hhksb972pp.1
Contributors:
Syam Mohan E,

Description

CUTEly MAD is a curated dataset for document-level sentiment analysis in the Malayalam language. CUTEly MAD is short for Curated Twitter Malayalam Dataset. The dataset is created by extracting Malayalam tweets from Twitter. A set of both positive and negative sentiment-oriented Malayalam words are identified, which are used as hashtags to extract tweets using Twitter API. Further, these tweets were manually labeled by a proficient annotator, based on their sentiment polarity into two classes, viz. negative and positive. If the sentiment is positive, then 1 is annotated. Otherwise, 0 is labeled for negative sentiment. A total of 2,000 tweets are labeled, where 50% are positive tweets and the other 50% are negative sentiment oriented.

Files

Categories

Natural Language Processing, Twitter, Sentiment Analysis, Indian Language

Licence