CUTEly MAD Dataset

Name: CUTEly MAD Dataset
Creator: Syam Mohan E
Published: 2024-01-29T19:06:33.139Z
Keywords: Natural Language Processing, Twitter, Sentiment Analysis, Indian Language

Mohan E, Syam; Sunitha, R

doi:10.17632/hhksb972pp.1

CUTEly MAD Dataset

Published: 29 January 2024| Version 1 | DOI: 10.17632/hhksb972pp.1

Contributors:

Syam Mohan E,

Description

CUTEly MAD is a curated dataset for document-level sentiment analysis in the Malayalam language. CUTEly MAD is short for Curated Twitter Malayalam Dataset. The dataset is created by extracting Malayalam tweets from Twitter. A set of both positive and negative sentiment-oriented Malayalam words are identified, which are used as hashtags to extract tweets using Twitter API. Further, these tweets were manually labeled by a proficient annotator, based on their sentiment polarity into two classes, viz. negative and positive. If the sentiment is positive, then 1 is annotated. Otherwise, 0 is labeled for negative sentiment. A total of 2,000 tweets are labeled, where 50% are positive tweets and the other 50% are negative sentiment oriented.

CUTEly MAD Dataset

Description

Files

Categories

Licence