Semantic Textual Similarity Kurdish
Description
To create a Kurdish paraphrase dataset, the PAWS dataset (Zhang et al., 2019) is translated into Central Kurdish using the Google Translate API, followed by a human review to ensure linguistic accuracy and cultural relevance. Human reviewers refine translations to align with Kurdish syntax and semantics, correct machine errors, and adapt cultural references for Kurdish-speaking audiences. Shown summarized in Table 2. TABLE 1. Example English-to-Kurdish Translations with Cultural Adaptations Original English Pair Translated and Culturally Adapted Kurdish Pair Flights from New York to Florida. فڕۆکەکان لە هەولێر بۆ سلێمانی. Which is the cheapest flight from NYC to Florida? کە هەرزانترین گەشتەی فڕۆکە لە هەولێر بۆ سلێمانی؟ Can a bad person become good? ئایا مرۆڤی خراپ دەتوانێت ببێتە مرۆڤێكی باش؟ Thanksgiving dinner was the best event of the year. ئێوارەخوانی جەژنی نەورۆزی كوردان باشترین بۆنەی ساڵ بوو.. This process ensures the development of a high-quality dataset tailored to Kurdish linguistic and cultural contexts, improving NLP applications for Kurdish-speaking users.