Amharic dataset for hate speech detection

Name: Amharic dataset for hate speech detection
Creator: mekuanent degu
Published: 2022-07-28T21:17:07.875Z
Keywords: Natural Language Processing, Machine Learning Algorithm, Deep Learning, Language, Long Short-Term Memory Network

degu, mekuanent

doi:10.17632/fhvsvsbvtg.3

Amharic dataset for hate speech detection

Published: 28 July 2022| Version 3 | DOI: 10.17632/fhvsvsbvtg.3

Contributor:

mekuanent degu

Description

the dataset is collected from social media such as facebook and telegram. the dataset is further processed. the collection are D1_org: this dataset is neither stemed nor stopword are remove: D1_sf: in this dataset stopwords are removed but not stemmed and in D3_stemed datset is stemmed and stopwords are removed. stemming is done using hornmorpho developed by Michael Gesser( available at https://github.com/hltdi/HornMorpho) all datasets are normalized and free from noise such as punctuation marks and emojs.

Files

Steps to reproduce

the dataset is collected from social media such as facebook and telegram. the dataset is further processed. the collection are D1_org: this dataset is neither stemed nor stopword are remove: D1_sf: in this dataset stopwords are removed but not stemmed and in D3_stemed datset is stemmed and stopwords are removed. stemming is done using hornmorpho developed by Michael Gesser( available at https://github.com/hltdi/HornMorpho) all datasets are normalized and free from noise such as punctuation marks and emojs. The kappa value between annotators was 0.61 anyone can reproduce this data accordingly its usage but it is strongly recommended to give credit for the contributer

Amharic dataset for hate speech detection

Description

Files

Steps to reproduce

Categories

Licence