4lang: Open Access Dataset for Cross-Lingual Plagiarism Detection
Published: 5 April 2023| Version 1 | DOI: 10.17632/vndpn2wsf9.1
Contributors:
German Gritsay, Karen Avetisyan, Andrey GrabovoyDescription
A dataset for cross-lingual plagiarism evaluation. 4collection.zip: a subset of Wikipedia articles on 4 languages (ru, hy, es, en). 4query.zip: wikipedia documents in each of the four languages with translated sentences with Google Translate API from collection. The archieve contains text documents and XML-markup for them. For the markup description and evaluation see http://pan.webis.de/clef13/pan13-web/plagiarism-detection.html
Files
Categories
Text Processing