KurdSum Dataset

Published: 14 August 2023| Version 1 | DOI: 10.17632/pvrfvc43cp.1
Soran Badawi


The KurdSum dataset stands as a comprehensive and invaluable resource for the development and enhancement of Kurdish language summarization models. With a vast collection of over 40,000 news articles, each meticulously distilled by proficient Kurdish journalists, the dataset provides an unparalleled corpus that encapsulates the myriad facets of human knowledge and experience. Encompassing an extensive spectrum of subjects, the KurdSum dataset spans across diverse domains including politics, sports, science, society, religion, health, art, and more. This encompassing variety ensures that the dataset mirrors the rich tapestry of topics that captivate the interests and concerns of Kurdish speakers. At the heart of KurdSum lies the diligent effort of skilled journalists who have skillfully distilled the essence of each article into concise summaries. This human touch brings a layer of contextual understanding, nuance, and linguistic finesse to the dataset. The inclusion of manually generated summaries not only aids in constructing coherent and coherent summaries but also serves as a source of inspiration for generating high-quality abstractions in Kurdish text. Researchers, developers, and language enthusiasts seeking to delve into the realm of Kurdish summarization stand to gain significantly from the KurdSum dataset. With its vast volume of diverse content and journalist-crafted summaries, the dataset provides a robust foundation for training and fine-tuning summarization models tailored to the nuances of the Kurdish language. This resource not only empowers the creation of efficient and accurate summarization algorithms but also nurtures the growth of natural language understanding within the Kurdish linguistic landscape. In conclusion, the KurdSum dataset emerges as a treasure trove of knowledge, thoughtfully curated by Kurdish journalists, that serves as a cornerstone for the development of Kurdish summarization models. Its extensive coverage of topics, precision-crafted summaries, and commitment to the Kurdish language make it an indispensable asset for researchers and developers striving to unlock the potential of automated summarization in the Kurdish linguistic context.



Machine Learning, Text Extraction, Kurd, Deep Learning