Published: 1 July 2020| Version 1 | DOI: 10.17632/dj95jh332j.1
Leila Ouahrani,


The ARabic Dataset for Automatic Short Answer Grading Evaluation V1. ISLRN 529-005-230-448-6. Our dataset consists of reported evaluations relate to answers submitted for three different exams submitted to three classes of students. The exams were conducted under natural conditions of evaluation. Each test consists of 16 short answer questions (a total of 48 questions). For each question, a model answer is proposed. Students submitted answers to these questions. The number of answers obtained is different from one question to another. The dataset includes a total of 2133 pairs (Model Answer, student answer). the Dataset encompasses 5 types of questions: • "عرف ": Define? • "إشرح": Explain? • "ما النتائج المترتبة على": What consequences? • "علل": Justify? • "ما الفرق": What is the difference AR-ASAG Dataset is available in different versions: TXT, XML, XML-MOODLE and Database (.DB). The .DB format allows making the necessary exports according to specific analysis needs. The XML-MOODLE format is used on Moodle e-learning Platforms For each pair, two grades (Mark1 and Mark2 ) are associated with a manual Average Gold Score Both manual grades are available in the dataset. Inter-Annotators Agreement: - (Pearson Correlation: r=0.8384) - (Root Mean Square Error : RMSE=0.8381). The Dataset can be also used for essay scoring as the students's answers responses take to reach 4-5 sentences. The Dataset exist in TXT, XML, XML-MOODLE Versions The name of the file is representative of its content. We use the term "Mark" to specify "Grade" For privacy reasons, no student identifiers are used in this Dataset.



Education, Answer Extraction, e-Learning Resource