Amharic dataset for hate speech detection
Description
the dataset is collected from social media such as facebook and telegram. the dataset is further processed. the collection are D1_org: this dataset is neither stemed nor stopword are remove: D1_sf: in this dataset stopwords are removed but not stemmed and in D3_stemed datset is stemmed and stopwords are removed. stemming is done using hornmorpho developed by Michael Gesser( available at https://github.com/hltdi/HornMorpho) all datasets are normalized and free from noise such as punctuation marks and emojs.
Files
Steps to reproduce
the dataset is collected from social media such as facebook and telegram. the dataset is further processed. the collection are D1_org: this dataset is neither stemed nor stopword are remove: D1_sf: in this dataset stopwords are removed but not stemmed and in D3_stemed datset is stemmed and stopwords are removed. stemming is done using hornmorpho developed by Michael Gesser( available at https://github.com/hltdi/HornMorpho) all datasets are normalized and free from noise such as punctuation marks and emojs. The kappa value between annotators was 0.61 anyone can reproduce this data accordingly its usage but it is strongly recommended to give credit for the contributer