Amharic dataset for hate speech detection

Published: 28 July 2022| Version 3 | DOI: 10.17632/fhvsvsbvtg.3
Contributor:
mekuanent degu

Description

the dataset is collected from social media such as facebook and telegram. the dataset is further processed. the collection are D1_org: this dataset is neither stemed nor stopword are remove: D1_sf: in this dataset stopwords are removed but not stemmed and in D3_stemed datset is stemmed and stopwords are removed. stemming is done using hornmorpho developed by Michael Gesser( available at https://github.com/hltdi/HornMorpho) all datasets are normalized and free from noise such as punctuation marks and emojs.

Files

Steps to reproduce

the dataset is collected from social media such as facebook and telegram. the dataset is further processed. the collection are D1_org: this dataset is neither stemed nor stopword are remove: D1_sf: in this dataset stopwords are removed but not stemmed and in D3_stemed datset is stemmed and stopwords are removed. stemming is done using hornmorpho developed by Michael Gesser( available at https://github.com/hltdi/HornMorpho) all datasets are normalized and free from noise such as punctuation marks and emojs. The kappa value between annotators was 0.61 anyone can reproduce this data accordingly its usage but it is strongly recommended to give credit for the contributer

Categories

Natural Language Processing, Machine Learning Algorithm, Deep Learning, Language, Long Short-Term Memory Network

Licence