Kurdish Dataset for Fake News Detection

Published: 7 October 2022| Version 1 | DOI: 10.17632/4ywyr2dbhc.1
Contributors:
Dana Abubakr Salh,

Description

Famous Kurdish news websites, which is officially recognized by the Kurdistan Journal Syndicate and Facebook pages were used to scrape articles. Three separate Kurdistan cities are covered by the public websites. The Kurdistan Regional Government of Iraq's three cities are Erbil, Sulaimani, and Halabja. The news articles are written in Kurdish. This dataset is also notable for being the first and largest in the Kurdish language to concentrate on the Sorani dialect. Over the course of a year, articles were scraped daily from the preset public news sources using Python scripts. The articles were scraped using Facepager, Web Scraper, and Python tools. We eliminated all duplicate articles. Because there was no fact-checked platform in Kurdish at the time, articles from public news sources were also gathered from various news sources and social media pages. Each public news source was divided into two categories based on the annotation criteria for the articles in the dataset: Fake or Real. Each story that was labeled was given a designation based on its public source category, the Kurdistan Journalist Syndicate's guidelines for Kurdish journalism, and various additional criteria based on social media platforms.

Files

Institutions

Sulaimani Polytechnic University

Categories

Information Classification

Licence