Amharic text dataset extracted from memes for hate speech detection or classification

Published: 8 June 2023| Version 2 | DOI: 10.17632/gw3fdtw5v7.2
Contributor:
Mequanent Degu

Description

the dataset is collected from social media such as facebook and telegram. the dataset is further processed. the collection are orginal_cleaned: this dataset is neither stemed nor stopword are remove: stopword_removed: in this dataset stopwords are removed but not stemmed and in stemed datset is stemmed and stopwords are removed. stemming is done using hornmorpho developed by Michael Gesser( available at https://github.com/hltdi/HornMorpho) all datasets are normalized and free from noise such as punctuation marks and emojs.

Files

Institutions

Debre Markos University College of Technology

Categories

Natural Language Processing, Data Mining, Machine Learning Algorithm, Deep Learning

Licence