Dataset from a Quasi-Experimental Study on Enhancing Academic Integrity and Reducing Plagiarism through a Generative AI-Based Paraphrasing Tool in Multilingual Higher Education
Description
This dataset is derived from a quasi-experimental study titled “Enhancing Academic Integrity and Reducing Plagiarism through a Generative AI-Based Paraphrasing Tool: A Quasi-Experimental Study in Multilingual Writing Education.” The study investigated the impact of Languafrasa, a generative AI-based paraphrasing tool specifically designed to support ethical academic writing, on the performance, ethical awareness, and perception of undergraduate students engaged in multilingual education. The study involved 300 undergraduate participants divided equally into experimental and control groups. Over a six-week period, the experimental group utilized the AI tool to complete weekly paraphrasing tasks integrated into a learning management system, while the control group performed the same tasks without access to AI. The dataset includes demographic profiles (age, gender, and field of study), pretest and posttest similarity scores (including gain scores), rubric-based evaluations of paraphrasing quality across five dimensions—semantic fidelity, syntactic transformation, lexical diversity, citation ethics, and clarity—along with measures of engagement and ethical awareness, and student perception survey responses in both Likert and open-ended formats. All statistical procedures were conducted using JASP, including descriptive statistics, assumption testing, paired and independent samples t-tests, one-way ANOVA, MANOVA, and Wilcoxon signed-rank tests. Results are presented in six summary PDF files corresponding to the study’s three research questions. Each file contains structured outputs, effect size calculations, and narrative interpretations. The dataset offers a comprehensive, transparent, and replicable resource for researchers, educators, and instructional designers interested in academic integrity, AI-assisted writing instruction, and multilingual learning environments in higher education.
Files
Steps to reproduce
To replicate the procedures and outcomes reported in this study, researchers should begin by selecting a sample of 300 undergraduate students representing a range of academic disciplines and comparable demographic backgrounds. Participants should be randomly assigned to experimental and control groups. A pretest should be administered in the form of an academic paraphrasing task using a standardized argumentative prompt. During the six-week intervention period, the experimental group should complete weekly paraphrasing tasks using the Languafrasa tool, while the control group completes the same tasks independently. Upon completion of the intervention, a posttest should be administered using a thematically parallel prompt. All writing tasks must be evaluated using a five-dimensional rubric measuring semantic fidelity, syntactic transformation, lexical diversity, citation ethics, and clarity. Text similarity scores must be extracted from Turnitin reports at both pre- and posttest stages, and student perceptions must be collected through a validated 20-item survey instrument including both closed- and open-ended questions. Data analysis should be performed using JASP, including assumption testing, t-tests, ANOVA, MANOVA, and non-parametric alternatives where appropriate. Results should be interpreted with reference to both statistical and pedagogical significance. Ethical clearance must be obtained in advance, and all participants must provide written informed consent. This dataset provides a validated foundation for researchers aiming to examine the integration of generative AI tools in writing instruction and academic ethics development.
Institutions
- Universitas Negeri Yogyakarta
- Universitas Negeri Padang