Annotated Corpus for the Detection of Arguments and Non-Arguments for Spanish texts

Published: 24 August 2022| Version 1 | DOI: 10.17632/xh7vvty9zt.1


The corpus contains 2875 processed and annotated tweet messages. All three annotators agreed with the labeling of the first annotation task: 1366 (48%) of these were labeled as an Argument and 1509 (52%) as a Non-Argument.


Steps to reproduce

The total number of registers annotated was 4000 text messages. It developed three annotation tasks: Arguments / Non-Argument, Claim and Premise / Premise, and Claim / None. According to the Annotation Method (Annotation Method was uploaded as an image and placed on the Images folder of this repository), once an independent assessment for each annotation task has been done and calculated, the Inter Annotator Agreement (IAA) metrics were used to determine the data reliability. The results concluded that the corpus for the first annotation task (Arguments / Non-Argument) is suitable to be published. The unweighted Cohen's kappa value between each pairwise combination of annotators was 0.62, 0.52, and 0.73, respectively, and the Fleiss Kappa for three (03) annotators was 0.63, indicating substantial agreement between annotators for this first annotation task.


Pontificia Universidad Catolica del Peru, Universidad Nacional Mayor de San Marcos


Data Mining, Argument, Text Mining