KurdHope: a Dataset for Hope Speech Detection For Kurdish Language
Description
The KurdHope dataset is a groundbreaking resource in the field of sentiment analysis and hope speech detection. It comprises over 90,000 news headlines, meticulously scraped from prominent Kurdish news platforms such as NRTTV, Channel8, Zagros, Rudaw, and others. The dataset's primary objective is to facilitate research and development of models capable of detecting hopeful content in news headlines. Annotation Process Each headline in the dataset has been labeled as either "hope" or "not-hope". The annotation process was conducted manually by three independent annotators to ensure the highest possible accuracy and reliability. Discrepancies were resolved through consensus, resulting in a robust labeling framework. Unique Features 1. Direct Traceability: The dataset provides direct links to the news websites where each headline was originally scraped. This feature ensures transparency and allows researchers to verify the dataset's sources, making it one of the first datasets in this domain with such traceable references. 2. First of Its Kind: To the best of our knowledge, KurdHope is the first dataset specifically curated for detecting hope speech in Kurdish news headlines. This pioneering effort opens up new research avenues in sentiment analysis, particularly within the context of Kurdish media. 3. High Utility for Scholars: Scholars and practitioners can leverage this dataset to train and evaluate more advanced models for detecting hopeful content in textual data. The dataset is expected to significantly contribute to the development of sentiment analysis tools that can discern optimism and positivity in media narratives. Research Applications The KurdHope dataset holds immense potential for a variety of applications: - Sentiment Analysis: Enhancing the accuracy of sentiment classification models by incorporating nuanced understanding of hope. - Media Studies: Analyzing trends in Kurdish news headlines to understand the prevalence and portrayal of hope. - Multilingual NLP: Expanding the scope of sentiment analysis for underrepresented languages like Kurdish. In summary, the KurdHope dataset is a transformative resource, providing a solid foundation for developing innovative NLP tools aimed at detecting and amplifying hopeful narratives in media. Its availability marks a significant milestone in computational linguistics and sentiment analysis, particularly in the Kurdish language context.