AI Versus Human-Generated Multiple-Choice Questions for Medical Education: A Cohort Study in a High-Stakes Examination
Description
This dataset contains anonymized results from a study evaluating the performance of AI-generated and human-generated multiple-choice questions (MCQs) in a high-stakes medical examination setting. The study aimed to assess the efficacy of ChatGPT-4o in creating MCQs comparable to those developed by human experts, focusing on psychometric properties, candidate performance, and expert reviews. Files Included: - Results (AI Anonymised): This Excel file contains anonymized data related to the performance of candidates on AI-generated MCQs, including difficulty indices and discrimination indices - Results (Human Anonymised): This Excel file contains anonymized data for candidate performance on human-generated MCQs, with the same parameters as the AI-generated dataset for direct comparison. - MCQ Bloom’s Taxonomy Analysis: This Word file provides a detailed breakdown of the cognitive levels assessed by both AI and human-generated MCQs, categorized according to Bloom's taxonomy (e.g., Remember, Understand, Apply, Analyze). - MCQ Expert Review: This Word file contains the expert reviews of both AI-generated and human-generated MCQs. The experts evaluated the questions based on key criteria, including factual correctness, relevance to emergency medicine, difficulty, and item writing flaws. It also includes the incidence of identified issues, such as irrelevant content or inappropriate difficulty levels. - MCQ Time Analysis: This Word file presents a detailed analysis of the time spent on generating, reviewing, and correcting both AI- and human-generated MCQs.