HASSANIYA-DTCD: A new Dataset for Benchmarking Text Classification Tasks on HASSANIYA Dialect

Published: 6 May 2025| Version 1 | DOI: 10.17632/r5k9ktwr4g.1
Contributor:
Med El Moustapha El ARBY

Description

HASSANIYA-DTCD: A new Dataset for Benchmarking Text Classification Tasks on HASSANIYA dialect is the first Mauritanian dialect dataset called “HASSANIYA” containing 1851 records classified into three categories: positive, negative, and neutral. This dataset was collected using web scraping tools from comments posted on the Facebook platform, and Label Studio was used to annotate each record. For more details, see the README file.

Files

Steps to reproduce

See the README File

Institutions

  • Universite de Nouakchott
  • Universite Sidi Mohamed Ben Abdallah

Categories

Data Mining, Natural Language Processing, Dialect, Sentiment Analysis

Licence