KurdABSA: Aspect Based Sentiment Analysis Dataset for Kurdish Language

Published: 6 August 2025| Version 1 | DOI: 10.17632/h5t7p4bcj2.1
Contributor:
Rania Azad

Description

The dataset is the first publicly available aspect-based sentiment analysis dataset for the Sorani dialect of Kurdish, addressing a critical gap in natural language processing (NLP) research for low-resource languages. The dataset comprised more than 4000 quadruplet ABSA in the restaurant review domain, written in the Kurdish language (Sorani dialect) using the Perso-Arabic script. The dataset was automatically annotated using a few-shot and prompt based model. This resource is intended for use in machine learning, deep learning, and cross-lingual model adaptation, making it suitable for training, fine-tuning, and benchmarking.

Files

Steps to reproduce

Please cite the dataset's paper if you use this dataset: (coming Soon)

Institutions

Sulaimani Polytechnic University

Categories

Natural Language Processing, Sentiment Analysis

Licence