How L2 Teachers Identify and Respond to GenAI-Generated Essays: A Sensemaking Theory Investigation

Published: 2 February 2026| Version 1 | DOI: 10.17632/x7m73syvc8.1
Contributor:
Yun Li

Description

This table contains the quantitative questionnaire data from the survey-experiment component of our mixed-methods manuscript on CSL/L2 teachers’ identification of GenAI-generated essays. In the questionnaire phase, 238 in-service teachers with experience in Chinese writing instruction completed an online task in which they judged the source of 12 essays one-by-one (four student-written and eight GenAI-generated). The student essays were sampled from the HSK Dynamic Composition Corpus (Advanced level) and selected to cover score bands 2–5 on the same writing topic; the GenAI essays were generated by two mainstream models (ChatGPT-5.1 and DeepSeekV3.2) using a standardized prompt template. Teachers provided a binary label (GenAI-generated vs. student-written) and briefly explained the basis for each decision. Hypothesis: because authorship identification is an equivocal, high-uncertainty task (especially in CSL writing), teachers’ classifications will be cue-driven and only modestly accurate overall, and accuracy will be shaped more by teachers’ GenAI use proficiency and by text/model characteristics than by teaching seniority. The results align with this hypothesis: overall identification accuracy is close to chance (47.72%); teaching experience does not significantly improve performance; but teachers with high GenAI proficiency achieve significantly higher accuracy than other groups, and accuracy differs across score bands. At the model level, teachers identify DeepSeek-generated essays more accurately than GPT-generated essays, suggesting different models can vary in “indistinguishability.” Notable findings and interpretation: teachers most frequently justify decisions with learner-typical error cues and with “GenAI-like” quality/style cues (e.g., overly neat structure, overly clear logic, overly fluent style), plus references to personal experience details as authenticity signals. Analytically, the table can be used to compute accuracy and error rates (overall, by model, and by band), test predictors using teacher background variables, and code rationales to quantify cue usage and relate cue profiles to correct/incorrect judgments—supporting transparent reuse and replication.

Files

Categories

Applied Linguistics

Licence