Amharic corpus

Published: 8 November 2021| Version 1 | DOI: 10.17632/dtywyf3sth.1
Contributor:
Seid Muhie Yimam

Description

This is an Amharic text corpus used to build different semantic models. Details about the dataset collection, different semantic models built with the dataset, and classification models are described in this paper "Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets" (https://www.mdpi.com/1999-5903/13/11/275)

Files

Steps to reproduce

Text is collected using different scrapping technologies

Institutions

Universitat Hamburg

Categories

Language Modeling, Corpus Analysis, Word Embedding

Licence