MSA-Moroccan Dialect: A Multimodal Sentiment Analysis Dataset for Moroccan Arabic (Darija)
Published: 5 January 2026| Version 1 | DOI: 10.17632/kfjztyzztb.1
Contributors:
BEN CHEIKHI Ayoub, Description
This dataset provides the first publicly available multimodal resource for sentiment analysis in Moroccan Arabic (Darija). It consists of 3,040 samples extracted from authentic Moroccan podcasts, each containing aligned text (transcript), audio (speech waveform), and visual (features) modalities. Samples are manually annotated with sentiment labels (positive, negative, neutral) by native speakers. The dataset is designed to support research in multimodal machine learning, cross-modal fusion, and low-resource dialectal NLP.
Files
Institutions
- Universite Sidi Mohamed Ben Abdellah Faculte des Sciences Dhar El Mahraz-Fes
Categories
Computer Science, Artificial Intelligence, Computer Vision, Natural Language Processing, Speech Analysis, Arabic Language, Multimodality Studies, Sentiment Analysis, Large Language Model