HASSANIYA-DTCD: A new Dataset for Benchmarking Text Classification Tasks on HASSANIYA Dialect
Published: 6 May 2025| Version 1 | DOI: 10.17632/r5k9ktwr4g.1
Contributor:
Med El Moustapha El ARBYDescription
HASSANIYA-DTCD: A new Dataset for Benchmarking Text Classification Tasks on HASSANIYA dialect is the first Mauritanian dialect dataset called “HASSANIYA” containing 1851 records classified into three categories: positive, negative, and neutral. This dataset was collected using web scraping tools from comments posted on the Facebook platform, and Label Studio was used to annotate each record. For more details, see the README file.
Files
Steps to reproduce
See the README File
Institutions
- Universite de Nouakchott
- Universite Sidi Mohamed Ben Abdallah
Categories
Data Mining, Natural Language Processing, Dialect, Sentiment Analysis