Kurdish Summarization Dataset (v2)

Published: 14 August 2023| Version 3 | DOI: 10.17632/gczpg2cnxy.3
Soran Badawi


The first Kurdish summarization dataset is a comprehensive collection of summaries from over 40,000 news articles and headlines written in the Sorani dialect of the Kurdish language. The articles cover topics such as political, economic, sport, religion, science, social, art and health. The dataset has been created to aid in the development and improvement of machine learning algorithms and natural language processing systems for summarization task in the Kurdish language. The dataset contains high-quality summaries that are created by human annotators. Each Summary is a considered version of the original article and headline, capturing its key information and important points in a concise manner. With the help of this dataset, researchers and developers can train and evaluate their summarization models for the Kurdish language, which can lead to the creation of more accurate and effective summarization tools. The dataset is a significant contribution to the development of natural language processing technologies for the Kurdish language and can open up new avenues for research and innovation in the field.



Machine Learning, Kurd, Deep Learning