Evaluating the Performance of ChatGPT on Dermatology Board-Style Exams: A Meta-Analysis of Text-Based and Image-Based Question Accuracy

Name: Evaluating the Performance of ChatGPT on Dermatology Board-Style Exams: A Meta-Analysis of Text-Based and Image-Based Question Accuracy
Creator: Ryan Chen
Published: 2025-02-06T17:35:54.350Z
Keywords: Artificial Intelligence, Dermatology, Medical Education, ChatGPT, Large Language Model

Chen, Ryan; Fettel, Kevin; Nguyen, Daniel; Nambudiri, Vinod

doi:10.17632/y4wx9hgbx5.1

Evaluating the Performance of ChatGPT on Dermatology Board-Style Exams: A Meta-Analysis of Text-Based and Image-Based Question Accuracy

Published: 6 February 2025| Version 1 | DOI: 10.17632/y4wx9hgbx5.1

Contributors:

Ryan Chen,

, Daniel Nguyen,

Description

1. Supplemental Figure I: A PRISMA diagram illustrating the study selection process. This diagram shows how articles were retrieved from PubMed and SCOPUS databases, with the search terms related to ChatGPT, dermatology, and exam-style questions. 2. Supplemental Figure II: This figure presents the total accuracy for each GPT model (ChatGPT-3, ChatGPT-3.5, and ChatGPT-4) across all the questions tested. It highlights the number of studies reporting performance for each model and provides 95% confidence intervals. 3. Supplemental Figure III: This figure breaks down the performance of each GPT model according to specific dermatology categories as per the American Board of Dermatology (ABD) criteria, including dermatopathology, general dermatology, pediatric dermatology, science research, and surgical dermatology. It also provides 95% confidence intervals for each category. 4. Supplemental Figure IV: This figure compares the performance of each GPT model on visual versus text-based questions. Since GPT-3 and GPT-3.5 lack visual recognition capabilities, this comparison primarily focuses on the performance of ChatGPT-4 in interpreting visual questions. The figure also provides 95% confidence intervals for each performance category

Files

Institutions

Brigham and Women's Hospital, Harvard Medical School

Evaluating the Performance of ChatGPT on Dermatology Board-Style Exams: A Meta-Analysis of Text-Based and Image-Based Question Accuracy

Description

Files

Institutions

Categories

Licence