AI-Enhanced Distractors and Test Quality
Description
This study tests the hypothesis that revising dysfunctional distractors in multiple-choice items with the assistance of artificial intelligence can improve the psychometric quality of items used in large-scale examinations. Specifically, it was hypothesized that AI-assisted revision of distractors would increase distractor functionality, improve internal consistency, and maintain measurement comparability across test administrations. The data were obtained from large-scale undergraduate examinations conducted in an open higher education context. Four courses were included in the analysis, two primarily verbally oriented and two quantitatively oriented. For each course, item-level response data from the initial exam administration were analyzed to identify dysfunctional distractors, defined as options selected by 5% or fewer of examinees. These distractors were revised using ChatGPT-4o through structured prompts developed by the researchers. During this process, item stems and correct answers were preserved to ensure that only the distractors were modified. The revised distractors were subsequently reviewed by subject-matter experts and refined through an iterative process to ensure content appropriateness and alignment with course objectives. The revised items were then administered to a new group of examinees in subsequent eCertificate examinations. The dataset therefore contains item-level response distributions from two administrations of the same items: the original version and the revised version with AI-assisted distractor improvements. The data include distractor selection frequencies, item parameters derived from item response theory, factor loadings from measurement models used to evaluate measurement invariance, and reliability indices such as Cronbach’s alpha. Overall, the dataset provides empirical evidence on how AI-assisted item revision affects distractor functioning, item difficulty, and test reliability in real examination settings. Researchers and assessment practitioners may use these data to examine the impact of human–AI collaboration on item development processes and to better understand the conditions under which AI-assisted revisions can improve large-scale assessment systems while preserving measurement comparability.
Files
Institutions
- Anadolu UniversityEskişehir, Eskişehir